-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Introduce conditional merge-operator invocation in point lookups #2923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
To support limiting the number of merge operands that are looked at during a full-merge as part of a Get operation.
463badc to
1bfc366
Compare
|
@sagar0 has updated the pull request. |
|
@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@sagar0 has updated the pull request. View: changes, changes since last import |
|
@sagar0 has updated the pull request. View: changes, changes since last import |
|
@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
siying
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the performance doesn't regress before committing it.
Benchmark results:Command line to Load data: ReadRandomMergeRandom bechmark results: Without this PR (on commit fc7476b): With the current PR: |
|
@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| // during a point lookup, thereby helping in limiting the number of levels to | ||
| // read from. | ||
| // Doesn't help with iterators. | ||
| virtual bool ShouldMerge(const std::vector<Slice>& operands) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question, when this gets called, do we already read all the operands from the disk? which means we still need to pay the disk io?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This ShouldMerge callback function gets called for each merge operand as we go down the levels, and we don't read more operands when ShouldMerge returns true. So we don't pay for any more disk io than what is absolutely necessary.
Example: Lets say:
bool ShouldMerge(const std::vector<Slice>& operands) const {
return (operands.size() >= 2);
}
Lets say, a key k1 has 8 merge operands spread over different levels.
On encountering a Get(k1), ShouldMerge gets called with one operand first, it returns false, and we continue trying to find more merge operands, if any.
Next it gets called when the second operand is encountered, returns true, which leads to invoking the FullMerge operator, and a value is returned.
All the merge operands for k1 at higher levels are not read from disk.
Summary: Now that RocksDB supports conditional merging during point lookups (introduced in #2923), Cassandra value merge operator can be updated to pass in a limit. The limit needs to be passed in from the Cassandra code. Closes #2947 Differential Revision: D5938454 Pulled By: sagar0 fbshipit-source-id: d64a72d53170d8cf202b53bd648475c3952f7d7f
Summary
For every merge operand encountered for a key in the read path we now have the ability to decide whether to look further (to retrieve more merge operands for the key) or stop and invoke the merge operator to return the value. The user needs to override
ShouldMerge()method with a condition to terminate search when true to avail this facility.This has a couple of advantages:
Example: Limiting the number of merge operands that are looked at: Lets say you have 10 merge operands for a key spread over various levels. If you only want RocksDB to look at the latest two merge operands instead of all 10 to compute the value, it is now possible with this PR. You can set the condition in
ShouldMerge()to return true when the size of the operand list is 2. Look at the example implementation in the unit test. Without this PR, a Get might look at all the 10 merge operands in different levels before invoking the merge-operator.Test Plan:
Added a new unit test.
Made sure that there is no perf regression by running benchmarks.
Benchmark results:
Command line to Load data:
ReadRandomMergeRandom bechmark results:
Command line:
Base -- Without this code change (on commit fc7476b):
With this code change: