-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement specialized group values for single Uft8/LargeUtf8/Binary/LargeBinary column #8827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f78bea7 to
886666b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
9224a38 to
28d9105
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7c23abb to
09b3c5f
Compare
|
I think this is getting pretty close. I need to add some more fuzz tests and maybe try to make this code generic and work for BinaryArray but otherwise it is ready |
| } | ||
| } | ||
|
|
||
| /// Maximum size of a string that can be inlined in the hash table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code and tests were moved into bytes_map.rs and made general
1009b85 to
6ee2e03
Compare
6ee2e03 to
da68c45
Compare
|
Update here is I have written a fuzz test (which actually caught several bugs 🐛 ). I plan to take a final pass through the code tomorrow morning and get it up for review (It is a pretty large PR so I expect it will take time to review, but it will be worth it I think) Update:
|
da68c45 to
489c4c1
Compare
ccb0e9b to
5854997
Compare
7f0b188 to
5a2a3d3
Compare
46ba28c to
637908c
Compare
d9fa058 to
637908c
Compare
637908c to
dc55c1b
Compare
|
Update here is that the benchmark programs consistently shows this query slower, but I can't reproduce the difference. I am continuing to investigate |
3de9042 to
b2fe2c7
Compare
…mn string grouping compiles. add test for null
b2fe2c7 to
53e7274
Compare
| }); | ||
| // SAFETY: the offsets were constructed correctly in `insert_if_new` -- | ||
| // monotonically increasing, overflows were checked. | ||
| let offsets = unsafe { OffsetBuffer::new_unchecked(ScalarBuffer::from(offsets)) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating the offsets actually took significant time which took me a while to find and trackdown (without this unsafe call ClickBench Q28 slowed down by 10%)
|
FYI @thinkharderdev and @Dandandan as you have expressed interest in this type of optimization before, this PR is now ready for review |
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
|
Thank you for the review @Dandandan 🙏 |
|
🚀 |
Note this looks like a large PR, but a substantial amount of it is moving code and tests and documentation
Which issue does this PR close?
closes #7064
Rationale for this change
This PR makes queries like this faster (grouping on strings):
ClickBench seems slightly slower in 34.0.0 than in 33.0.0: #8789 (comment) so we need some more real wins
One thing that takes significant time in queries is grouping on a single string column, where the current GroupsAccumulator copies the strings several times:
Rows,OwnedRow,Rowsto form the final output ArrayWe can make such queries faster by avoiding the copies
What changes are included in this PR?
SSOStringSetthat @jayzhan211 and I implemented in OptimizeCOUNT( DISTINCT ...)for strings (up to 9x faster) #8849 intoArrowBinaryMapwhich also handles values and both string and binaryCOUNT DISTINCTspecialization to handle binary/large binaryArrowBinaryMapto implementGroupValuesByes, aGroupsValuesimplementation for String / LargeString / Binary / LargeBinary dataAre these changes tested?
Yes:
MemTable::with_sort_exprs#9190)Are there any user-facing changes?
Faster performance.
For Clickbench, several of the longest running queries got significantly faster. This is almost certainly the result of less string copying as they are grouping by
"Url", a string columnThere are two that are reported to be slower, but I think given they run in 10s of ms this is likely a measurement error
For ClickBench extended things got faster:
For TPCH there were also several queries that got faster, but the time spent is so small (10s of ms) that I think it likely measurement error
Full Performance Results
TODOs
ArrowStringMapArrowStringMapupstream to arrow-rs (could use the DictionaryArray builder)