-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Use the upstream arrow-rs coalesce kernel #17193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the upstream arrow-rs coalesce kernel #17193
Conversation
…eam_arrow_coalesce
…eam_arrow_coalesce
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
8dec409 to
d303a4d
Compare
|
Hi @alamb I am reviving this PR since the upstream arrow-rs has be upgraded, thanks! |
Thank you @zhuqi-lucas -- I was just thinking the other day that we (I really) have dropped the ball on the coalesce kernel. I feel like once we get it into DataFusion we can then drive improvements upstream in arrow... |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thank you @alamb for benchmark, the result is good! |
|
I agree the benchmarks look good to me (no significant difference in performance) @zhuqi-lucas would it be ok to update the title of this PR and its description? I think the core change here is to use the upstream 'coalesce' kernel. Is that your understanding too? |
Updated the title now, thank you @alamb . |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
|
Thank you @alamb @2010YOUY01 for review! Merged now. |
|
Awesome -- one step closer |
## Which issue does this PR close? Use the upstream arrow-rs coalesce kernel, and support LimitedBatchCoalesce for datafusion ## Rationale for this change Use the upstream arrow-rs coalesce kernel, also it will support LimitedBatchCoalesce for datafusion. There are some future work based this, for example [Push limit into joins ](apache#18295) which will optimize join. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Daniël Heres <[email protected]>
Which issue does this PR close?
Use the upstream arrow-rs coalesce kernel, and support LimitedBatchCoalesce for datafusion
Rationale for this change
Use the upstream arrow-rs coalesce kernel, also it will support LimitedBatchCoalesce for datafusion. There are some future work based this, for example Push limit into joins which will optimize join.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?