-
Notifications
You must be signed in to change notification settings - Fork 6.6k
pass key/value samples through zstd compression dictionary generator #3057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
yiwu-arbug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
util/compression.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this answer the following should be safe:
https://stackoverflow.com/questions/39200665/directly-write-into-char-buffer-of-stdstring
std::string dict_data(max_dict_bytes, '\0');
size_t dict_len =
ZDICT_trainFromBuffer(&dict_data[0], max_dict_bytes, &samples[0],
&sample_lens[0], sample_lens.size());
...
return dict_data;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Yi.
|
@ajkr has updated the pull request. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when `CompressionOptions::zstd_max_train_bytes` is set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before. Note this is the first step of facebook#2987, extracted into a separate PR per reviewer request. Closes facebook#3057 Differential Revision: D6116891 Pulled By: ajkr fbshipit-source-id: 70ab13cc4c734fa02e554180eed0618b75255497
Summary: - moved existing compression options to `InitializeOptionsGeneral` since they cannot be set through options file - added flag for `zstd_max_train_bytes` which was recently introduced by #3057 Closes #3128 Differential Revision: D6240460 Pulled By: ajkr fbshipit-source-id: 27dbebd86a55de237ba6a45cc79cff9214e82ebc
Summary: With the ZSTD dictionary generator support added in #3057 `PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to: https://github.com/facebook/rocksdb/blob/e3a06f12d27fd50af7b6c5941973f529601f9a3e/util/compression.h#L39 In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path. Closes #3260 Differential Revision: D6669850 Pulled By: sagar0 fbshipit-source-id: f8a7562a670e5aed4c4fb6034a921697590d7285
Summary: With the ZSTD dictionary generator support added in #3057 `PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to: https://github.com/facebook/rocksdb/blob/e3a06f12d27fd50af7b6c5941973f529601f9a3e/util/compression.h#L39 In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path. Closes #3260 Differential Revision: D6669850 Pulled By: sagar0 fbshipit-source-id: f8a7562a670e5aed4c4fb6034a921697590d7285
Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when
CompressionOptions::zstd_max_train_bytesis set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before.Note this is the first step of #2987, extracted into a separate PR per reviewer request.
Test Plan:
make check -j64