forked from facebook/rocksdb
-
Notifications
You must be signed in to change notification settings - Fork 97
[DNM]upgrade to 5.10.fb #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
zhangjinpeng87
wants to merge
301
commits into
tikv:release
from
zhangjinpeng87:zhangjinpeng/upgrade-to-5.10
Closed
[DNM]upgrade to 5.10.fb #14
zhangjinpeng87
wants to merge
301
commits into
tikv:release
from
zhangjinpeng87:zhangjinpeng/upgrade-to-5.10
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options. As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect. Closes facebook#2888 Differential Revision: D5846368 Pulled By: maysamyabandeh fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683
Summary: By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions. Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch. Closes facebook#2885 Differential Revision: D5837843 Pulled By: maysamyabandeh fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
Summary: In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected. This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears. This PR also removes an old optimization in cassandra merge operator for single merge operands. We need to do GC even on a single operand, so the optimation does not make sense anymore. Closes facebook#2855 Reviewed By: sagar0 Differential Revision: D5806445 Pulled By: wpc fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b
Summary: snprintf is defined as _snprintf, which doesn't exist in the std namespace. Closes facebook#2298 Differential Revision: D5070457 Pulled By: yiwu-arbug fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
Summary: This is caught when I was testing facebook#2886. Closes facebook#2907 Differential Revision: D5863153 Pulled By: yiwu-arbug fbshipit-source-id: 8c54759ba1a0dc101f24ab50423e35731300612d
Summary: Hello, current master branch declares cmake_minimum_required (VERSION 2.8.11) but cmake gives the following error: [ 6%] CMake Error at CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES". CMake Error at src/CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES". because this argument not supported on CMake versions prior 2.8.12 Closes facebook#2904 Differential Revision: D5863430 Pulled By: yiwu-arbug fbshipit-source-id: 0f7230e080add472ad4b87836b3104ea0b971a38
Summary: This option was introduced in the C++ API in RocksDB 5.6 in bb01c18 . Now, exposing it through RocksJava API. Closes facebook#2908 Differential Revision: D5864224 Pulled By: sagar0 fbshipit-source-id: 140aa55dcf74b14e4d11219d996735c7fdddf513
Summary: The test didn't delete txn before creating a new one. Closes facebook#2913 Differential Revision: D5880236 Pulled By: yiwu-arbug fbshipit-source-id: 7a4fcaada3d86332292754502cd8f4341143bf4f
Summary: it's unsupported in options file, so the flag should be respected by db_bench even when an options file is provided. Closes facebook#2910 Differential Revision: D5869836 Pulled By: ajkr fbshipit-source-id: f67f591ae083e95e989f86b6fad50765d2e3d855
Summary: Fix for [2461](facebook#2461). Problem: When using multiple db_paths setting with RocksDB, RocksDB incorrectly calculates the size of L1 in LevelCompactionBuilder::GetPathId. max_bytes_for_level_base is used as L0 size and L1 size is calculated as (L0 size * max_bytes_for_level_multiplier). However, L1 size should be max_bytes_for_level_base. Solution: Use max_bytes_for_level_base as L1 size. Also, use L1 size as the estimated size of L0. Closes facebook#2903 Differential Revision: D5885442 Pulled By: maysamyabandeh fbshipit-source-id: 036da1c9298d173b9b80479cc6661ee4b7a951f6
Summary: Problem: During RocksJava performance testing we found that the rocksdb jni library is not built with jemalloc; instead it was getting built with the default glibc malloc. We saw quite a bit of memory bloat due to this. Addressed this by installing jemalloc-devel package in the vm that we use to build release jars. Closes facebook#2916 Differential Revision: D5887018 Pulled By: sagar0 fbshipit-source-id: ace0b5d60234b3a30dcd5d39633e7827a5982a50
…eName APIs in RocksJava Summary: JNI wrappers for LoadLatestOptions, LoadOptionsFromFile and GetLatestOptionsFileName APIs. Closes facebook#2898 Differential Revision: D5857934 Pulled By: sagar0 fbshipit-source-id: 68b79e83eab8de9416e3f1fef73e11cf7947e90a
Summary: fixes facebook#2902 Closes facebook#2917 Differential Revision: D5887175 Pulled By: ajkr fbshipit-source-id: 364e292c636a3238bfc53b0fb9a01ff2f82dcbb9
Summary: Problem: - `DB::SanitizeOptions` strips trailing slash from `wal_dir` but not `dbname` - We check whether `wal_dir` and `dbname` refer to the same directory using string equality: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L258 - Providing `dbname` with trailing slash causes default `wal_dir` to be misidentified as a separate directory. - Then the repair tries to add all SST files to the `VersionEdit` twice (once for `dbname` dir, once for `wal_dir`) and fails with coredump. Solution: - Add a new `Env` function, `AreFilesSame`, which uses device and inode number to check whether files are the same. It's currently only implemented in `PosixEnv`. - Migrate repair to use `AreFilesSame` to check whether `dbname` and `wal_dir` are same. If unsupported, falls back to string comparison. Closes facebook#2827 Differential Revision: D5761349 Pulled By: ajkr fbshipit-source-id: c839d548678b742af1166d60b09abd94e5476238
Summary: Closes facebook#2922 Differential Revision: D5895310 Pulled By: yiwu-arbug fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2
Summary: Context/problem: - CFs may be flushed at different times - A WAL can only be deleted after all CFs have flushed beyond end of that WAL. - Point-in-time recovery might stop upon reaching the first corruption. - Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs. Closes facebook#2900 Differential Revision: D5863281 Pulled By: miasantreble fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e
Summary: Closes facebook#2927 Differential Revision: D5906613 Pulled By: siying fbshipit-source-id: 607401e05b27508c816c700864fe81514606e4ef
Summary: Its timing out under tsan. Closes facebook#2928 Differential Revision: D5911766 Pulled By: maysamyabandeh fbshipit-source-id: 2faacc07752ac8713a3a2abb5a4c4b7ae3bdf208
Summary: Expose DestroyDB API in RocksJava. Closes facebook#2934 Differential Revision: D5914775 Pulled By: sagar0 fbshipit-source-id: 84af6ea0d2bccdcfb9fe8c07b2f87373f0d5bab6
Summary: Looks like the API is simply missing. Adding it. Closes facebook#2937 Differential Revision: D5919955 Pulled By: yiwu-arbug fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878
Summary: SUMMARY Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed. TEST PLAN ran make check all passed Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value. Closes facebook#2893 Reviewed By: yiwu-arbug Differential Revision: D5845814 Pulled By: TheRushingWookie fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
Summary: A user encountered segfault on the call to `CacheDependencies()`, probably because `NewIndexIterator()` failed before populating `*index_entry`. Let's avoid the call in that case. Closes facebook#2939 Differential Revision: D5928611 Pulled By: ajkr fbshipit-source-id: 484be453dbb00e5e160e9c6a1bc933df7d80f574
Summary: Closes facebook#2942 Differential Revision: D5932858 Pulled By: maysamyabandeh fbshipit-source-id: e11f52a0b08d65149bb49d99d1dbc82cb5a96fa0
Summary: There is no need for smart pointers in cf_info_map, so use RAII. This should also placate valgrind. Closes facebook#2943 Differential Revision: D5932941 Pulled By: asandryh fbshipit-source-id: 2c37df88573a9df2557880a31193926e4425e054
Summary: For every merge operand encountered for a key in the read path we now have the ability to decide whether to look further (to retrieve more merge operands for the key) or stop and invoke the merge operator to return the value. The user needs to override `ShouldMerge()` method with a condition to terminate search when true to avail this facility. This has a couple of advantages: 1. It helps in limiting the number of merge operands that are looked at to compute a value as part of a user Get operation. 2. It allows to peek at a merge key-value to see if further merge operands need to look at. Example: Limiting the number of merge operands that are looked at: Lets say you have 10 merge operands for a key spread over various levels. If you only want RocksDB to look at the latest two merge operands instead of all 10 to compute the value, it is now possible with this PR. You can set the condition in `ShouldMerge()` to return true when the size of the operand list is 2. Look at the example implementation in the unit test. Without this PR, a Get might look at all the 10 merge operands in different levels before invoking the merge-operator. Added a new unit test. Made sure that there is no perf regression by running benchmarks. Command line to Load data: ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="mergerandom" --merge_operator="uint64add" --num=10000000 ... mergerandom : 12.861 micros/op 77757 ops/sec; 8.6 MB/s ( updates:10000000) ``` **ReadRandomMergeRandom bechmark results:** Command line: ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="readrandommergerandom" --merge_operator="uint64add" --num=10000000 ``` Base -- Without this code change (on commit fc7476b): ``` readrandommergerandom : 38.586 micros/op 25916 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8) ``` With this code change: ``` readrandommergerandom : 38.653 micros/op 25870 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8) ``` Closes facebook#2923 Differential Revision: D5898239 Pulled By: sagar0 fbshipit-source-id: daefa325019f77968639a75c851d46352c2303ef
Summary: The default one will try to install rocksdb:x86-windows, which would lead to failing of the build at the last step (CMake Error, Rocksdb only supports x64). Because it will try to install a serials of x86 version package, and those cannot proceed to rocksdb:x86-windows building. By using rocksdb:x64-windows, we can make sure to install x64 version. Tested on Win10 x64. Closes facebook#2941 Differential Revision: D5937139 Pulled By: sagar0 fbshipit-source-id: 15637fe23df59326a0e607bd4d5c48733e20bae3
Summary: Recover txns from the WAL. Also added some unit tests. Closes facebook#2901 Differential Revision: D5859596 Pulled By: maysamyabandeh fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0
Summary: It was broken when `NotifyCollectTableCollectorsOnFinish` was introduced. That function called `Finish` on each of the `TablePropertiesCollector`s, and `CompactOnDeletionCollector::Finish()` was resetting all its internal state. Then, when we checked whether compaction is necessary, the flag had already been cleared. Fixed above issue by avoiding resetting internal state during `Finish()`. Multiple calls to `Finish()` are allowed, but callers cannot invoke `AddUserKey()` on the collector after any finishes. Closes facebook#2936 Differential Revision: D5918659 Pulled By: ajkr fbshipit-source-id: 4f05e9d80e50ee762ba1e611d8d22620029dca6b
Summary: When using with compressed cache it is possible that the status is ok but the block is not actually added to the block cache. The patch takes this case into account. Closes facebook#2945 Differential Revision: D5937613 Pulled By: maysamyabandeh fbshipit-source-id: 5428cf1115e5046b3d01ab78d26cb181122af4c6
Summary: This template reminds the users to use issues only for bug reports. The template is written according to the github guidelines at https://help.github.com/articles/creating-an-issue-template-for-your-repository/ Closes facebook#2948 Differential Revision: D5943558 Pulled By: maysamyabandeh fbshipit-source-id: c83b5d211ea8e334107141967689b2f0c453bbc9
Summary: I experienced weird segfault because of this mismatch of type in log formatting. Fix it. Closes facebook#3345 Differential Revision: D6687224 Pulled By: siying fbshipit-source-id: c51fb1c008b7ebc3efdc353a4adad3e8f5b3e9de
Summary: Most popular versions of GCC can't identify platform on ARM if "-march=native" is specified. Remove it to unblock most people. Closes facebook#3346 Differential Revision: D6690544 Pulled By: siying fbshipit-source-id: bbaba9fe2645b6b37144b36ea75beeff88992b49
Summary: The macro was added by mistake in facebook#2372 Closes facebook#3343 Differential Revision: D6681356 Pulled By: yiwu-arbug fbshipit-source-id: 4180172fb0eaef4189c07f219241e0c261c03461
Summary: With the ZSTD dictionary generator support added in facebook#3057 `PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to: https://github.com/facebook/rocksdb/blob/e3a06f12d27fd50af7b6c5941973f529601f9a3e/util/compression.h#L39 In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path. Closes facebook#3260 Differential Revision: D6669850 Pulled By: sagar0 fbshipit-source-id: f8a7562a670e5aed4c4fb6034a921697590d7285
Summary: added support for C and asm files as required for e612e31. Closes facebook#3299 Differential Revision: D6612479 Pulled By: ajkr fbshipit-source-id: 6263ed7c1602f249460421825c76b5721f396163
Summary: Tested on a clean FreeBSD 11.01 x64. Closes facebook#1423 Closes facebook#3357 Differential Revision: D6705868 Pulled By: sagar0 fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
Summary: Flush() call could be waiting indefinitely if min_write_buffer_number_to_merge is used. Consider the sequence: 1. User call Flush() with flush_options.wait = true 2. The manual flush started in the background 3. New memtable become immutable because of writes. The new memtable will not trigger flush if min_write_buffer_number_to_merge is not reached. 4. The manual flush finish. Because of the new memtable created at step 3 not being flush, previous logic of WaitForFlushMemTable() keep waiting, despite the memtables it intent to flush has been flushed. Here instead of checking if there are any more memtables to flush, WaitForFlushMemTable() also check the id of the earliest memtable. If the id is larger than that of latest memtable at the time flush was initiated, it means all the memtable at the time of flush start has all been flush. Closes facebook#3378 Differential Revision: D6746789 Pulled By: yiwu-arbug fbshipit-source-id: 35e698f71c7f90b06337a93e6825f4ea3b619bfa
Summary: Java build on PPC64le has been broken since a few months, due to facebook#2716. Fixing it with the least amount of changes. (We should cleanup a little around this code when time permits). This should fix the build failures seen in http://140.211.168.68:8080/job/Rocksdb/ . Closes facebook#3359 Differential Revision: D6712938 Pulled By: sagar0 fbshipit-source-id: 3046e8f072180693de2af4762934ec1ace309ca4
Summary: Allow StackableDB optionally takes a shared_ptr on construction and thus hold shared ownership of the underlying DB. Closes facebook#3423 Differential Revision: D6824163 Pulled By: yiwu-arbug fbshipit-source-id: dbdc30c42e007533a987ef413785e192340f03eb
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues. Reviewed By: yiwu-arbug Differential Revision: D6821806 fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
Summary:
Workaround a bunch of "implicit-fallthrough" compiler errors, like:
```
util/crc32c.cc:533:7: error: this statement may fall through [-Werror=implicit-fallthrough=]
crc = _mm_crc32_u64(crc, *(uint64_t*)(buf + offset));
^
util/crc32c.cc:1016:9: note: in expansion of macro ‘CRCsinglet’
CRCsinglet(crc0, next, -2 * 8);
^~~~~~~~~~
util/crc32c.cc:1017:7: note: here
case 1:
```
Closes facebook#3339
Reviewed By: sagar0
Differential Revision: D6874736
Pulled By: quark-zju
fbshipit-source-id: eec9f3bc135e12fca336928d01711006d5c3cb16
Summary:
This fixes the following warnings when compiled with GCC7:
util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::DBGet(rocksdb::DB*, rocksdb::Transaction*, rocksdb::ReadOptions&, uint16_t, uint64_t, bool, uint64_t*, std::__cxx11::string*, bool*)’:
util/transaction_test_util.cc:75:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
Status RandomTransactionInserter::DBGet(
^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:84:11: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::Verify(rocksdb::DB*, uint16_t, uint64_t, bool, rocksdb::Random64*)’:
util/transaction_test_util.cc:245:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets,
^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:268:13: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Closes facebook#3295
Differential Revision: D6609411
Pulled By: maysamyabandeh
fbshipit-source-id: 33f0add471056eb59db2f8bd4366e6dfbb1a187d
Summary: We don't do fsync() after truncate in direct I/O writeable file (in fact we don't do any fsync ever). This can cause metadata not persistent to disk after the file is generated. We call it instead. Closes facebook#3500 Differential Revision: D6981482 Pulled By: siying fbshipit-source-id: 7e2b591b7e5dd1b96fc0775515b8b9e6092980ef
Summary: Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_. This deadlock is hit all the time on our workload. It blocks our release. In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so. So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them. This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup. I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine. Closes facebook#3510 Reviewed By: sagar0 Differential Revision: D7005346 Pulled By: al13n321 fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
Summary: Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped. Closes facebook#3525 Differential Revision: D7033694 Pulled By: sagar0 fbshipit-source-id: d51e6c00f5e9d766c6276ef79774b81c6c5216f8
Summary: Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier. Closes facebook#3463 Differential Revision: D6893333 Pulled By: ajkr fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
Summary: As discovered during v5.9.2 release, and forward-ported. Closes facebook#3323 Differential Revision: D6657209 Pulled By: siying fbshipit-source-id: b560d9f8ddb89e0ffaff7c895ec80f68ddf7dab4
Member
Author
|
@huachaohuang Do we have any customization? |
|
@zhangjinpeng1987 I'm not entirely convinced this is even used. The |
mittalrishabh
added a commit
to mittalrishabh/rocksdb
that referenced
this pull request
Apr 12, 2024
…v#14) Signed-off-by: Qi Xu <[email protected]> Co-authored-by: Michael Deng <[email protected]> Co-authored-by: Qi Xu <[email protected]> Co-authored-by: rishabh_mittal <[email protected]> Co-authored-by: Andrew Kryczka <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Keep consistency with 5.10.fb.