-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
PROPOSAL EPICA proposal being discussed that is not yet fully underwayA proposal being discussed that is not yet fully underwayenhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
This ticket has links a collection of various ways to make queries with LIMIT or various other variants (like row_number() predicates) both:
- Go faster
- Use less memory
These are typically called "Top K" style optimizations in databases and optimize the pattern of a sort followed by a limit
LIMIT(fetch = 10)
SORT(x)
INPUT...
The observation is that if the INPUT is much larger than the fetch (aka the K) it is much more efficient and less memory intensive to track the top 10 values rather than sort the entire input and discard everything except the top 10
Normally this done with special ExecutionPlan operators. What the operators do and behave depend on the exact query pattern.
Describe the solution you'd like
- Optimize "LIMIT" queries for speed / memory with special TopK operator #7196
- Top-K query optimization in sort uses substantial memory #7149
- Improve Memory usage + performance with large numbers of groups / High Cardinality Aggregates #6937
- Optimize SELECT min/max queries with limit #7198
- Improve aggregate performance with specialized groups accumulator for single string group by #7064
- Optimize "per partition" top-k :
ROW_NUMBER < 5/ TopK #6899 - Memory is coupled to
group bycardinality, even when the aggregate output is truncated by alimitclause #7191 - Add a built-in UDAF approx_sum_topn based on space saving algorithm #2365
- Do not sort data that is already sorted #7162
- Avoiding spilling in TopK queries by reinserting the to-spill data to memory buffer #3579
- Question: is the combination of limit and predicate push-down safe in ParquetExec? #900
Describe alternatives you've considered
No response
Additional context
No response
liukun4515
Metadata
Metadata
Assignees
Labels
PROPOSAL EPICA proposal being discussed that is not yet fully underwayA proposal being discussed that is not yet fully underwayenhancementNew feature or requestNew feature or request