-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Consider a use case where required ordering is (a ASC,b ASC), and existing ordering is (a ASC).
As an example input is like following
| a | b |
|---|---|
| 1 | 2 |
| 1 | 3 |
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 1 |
expected output is like following
| a | b |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
If we were to use information about existing ordering. We could buffer up a values until it changes like below
| a | b |
|---|---|
| 1 | 2 |
| 1 | 3 |
| 1 | 1 |
when 2 is received for the value of a. We could then sort subtable according to desired ordering (b ASC), then emit following result
| a | b |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
I think this operator
- Enable us to use
SortExecwithout breaking pipeline for some use cases (for this behavior we can write a new operator also). - Decrease the memory usage of the
SortExecwhen input ordering satisfy a prefix of the desired ordering.
Describe the solution you'd like
No response
Describe alternatives you've considered
This should be a new operator or current SortExec can be extended to behave this way.
However, I do think that extending current SortExec to behave this way is better option because:
- Existing plans will immediately benefit from this change (otherwise we need to write a rule to choose between
SortExecand newPartialSortExec) - less change will be introduced to code base. Since I presume, the two operators will have lots of code common anyway.
Additional context
See discussion for more background.
alamb and mapleFU
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request