Skip to content

Avoiding spilling in TopK queries by reinserting the to-spill data to memory buffer #3579

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We recently added optimizations for ORDER BY expr LIMIT by pushing limits to individual operations (saving memory, CPU time + limiting output rows) and executing sorts in parallel.

The disk spill operation in SortExec currently still assumes the to-spill disk doesn't fit in memory.
However after sorting we only have to keep the batch(es) with top fetch rows and store those, which probably avoids spilling to disk.

Describe the solution you'd like
We can identify that the to-spill data fits in memory after being merged / sorted and avoid spilling to disk.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions