Skip to content

Add partial Sort Operator #7456

@mustafasrepo

Description

@mustafasrepo

Is your feature request related to a problem or challenge?

Consider a use case where required ordering is (a ASC,b ASC), and existing ordering is (a ASC).
As an example input is like following

a b
1 2
1 3
1 1
2 2
2 3
2 1

expected output is like following

a b
1 1
1 2
1 3
2 1
2 2
2 3

If we were to use information about existing ordering. We could buffer up a values until it changes like below

a b
1 2
1 3
1 1

when 2 is received for the value of a. We could then sort subtable according to desired ordering (b ASC), then emit following result

a b
1 1
1 2
1 3

I think this operator

  • Enable us to use SortExec without breaking pipeline for some use cases (for this behavior we can write a new operator also).
  • Decrease the memory usage of the SortExec when input ordering satisfy a prefix of the desired ordering.

Describe the solution you'd like

No response

Describe alternatives you've considered

This should be a new operator or current SortExec can be extended to behave this way.
However, I do think that extending current SortExec to behave this way is better option because:

  • Existing plans will immediately benefit from this change (otherwise we need to write a rule to choose between SortExec and new PartialSortExec)
  • less change will be introduced to code base. Since I presume, the two operators will have lots of code common anyway.

Additional context

See discussion for more background.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions