-
Notifications
You must be signed in to change notification settings - Fork 86
Bump DataFusion to 50 and arrow to 56 #4577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f1416a to
018f180
Compare
CodSpeed Performance ReportMerging #4577 will improve performances by 41.79%Comparing Summary
Benchmarks breakdown
|
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
0ax1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question inline.
| to_arrow_primitive::<Float64Type>(array) | ||
| } | ||
| (Canonical::Decimal(array), DataType::Decimal32(precision, scale)) => { | ||
| if array.decimal_dtype().precision() != *precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check too restrictive? Why not check whether the target type can hold at least the same precision and scale? Maybe I'm misreading something here.
array.decimal_dtype().precision() <= *precision
array.decimal_dtype().scale() <= *scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in bail if array.decimal_dtype().precision() > *precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree, I'll think about it a bit more but probably include it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending some more time with with arrow's and datafusion's decimal-handling code, I think we shouldn't change precision and scale here. Keeping it this way guarantees predictable behavior and zero-copy, unless explicitly cast.
|
Some work here depends on the outcome of apache/datafusion#17489, if DataFusion's support for smaller decimals doesn't improve, I'll have to rethink some stuff here. |
| } | ||
|
|
||
| pub(crate) fn can_be_pushed_down(expr: &PhysicalExprRef, schema: &Schema) -> bool { | ||
| // We currently do not support pushdown of dynamic expressions in DF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the dynamic filter will be gradually stable in the next few versions. How about recording an issue for this? There may be potential contributors to participate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already opened :) #4034
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I should link to the issue in the code, I've also expanded there a bit about my current understanding of the issue, which might not be correct but probably worth externalizing.
e407032 to
0a99805
Compare
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
0a99805 to
88eaa65
Compare
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Just made sure there's no breaking changes coming our way. Includes support for the two new Arrow decimal types (32 and 64 bits), but without converting from Vortex dtype into them. Codspeed improvements are due to a new inline hint on arrow's `BitIterator::next` (found by @0ax1 🥳). Open issues: - [ ] apache/datafusion#17489 - mostly affects if/how much support we want to provide to decimals, or whether we need to provide some other solution for DataFusion. - [ ] Issue with followups - vortex-data#4668 --------- Signed-off-by: Adam Gutglick <[email protected]> (cherry picked from commit cc426c8)
Just made sure there's no breaking changes coming our way. Includes support for the two new Arrow decimal types (32 and 64 bits), but without converting from Vortex dtype into them.
Codspeed improvements are due to a new inline hint on arrow's
BitIterator::next(found by @0ax1 🥳).Open issues: