-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I plan to make sorting / merging faster. My reasons;
- I find it personally interesting
- It is a key piece of technology to bring DataFusion's performance to be on par with things like DuckDB
- It is important for my project IOx in the medium term
Describe the solution you'd like
Basically the plan is to follow the advice given by Goetz Graefe in Implementing sorting in database systems
and successfully implemented in systems like DuckDB (see blog post)`
It will likely involve some combination of a specialization of the row format and JIT comparisons
Here is my rough plan and a sketch of the kinds of things I want to work on
- Benchmark for sort preserving merge #2431
- Research sort directly on raw bytes of composite sort keys for better performance #2150
- Research usage of row format for sort records buffering #2151
- POC of comparing using row format
- Add full type support for row format comparisons
- Turn POC to real
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.