Skip to content

Support ORDER BY in AggregateUDF  #8984

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Some built in aggregates (such as FIRST_VALUE, LAST_VALUE and ARRAY_AGG) support an optional ORDER BY argument that defines the order they see their input. For example:

❯ create table foo(x int, y int) as values (1, 100),(2, 100),(0, 200);
0 rows in set. Query took 0.003 seconds.

-- note the `ORDER BY x` in the argument to `FIRST_VALUE`select FIRST_VALUE(x ORDER BY x) from foo GROUP BY y;
+--------------------+
| FIRST_VALUE(foo.x) |
+--------------------+
| 1                  |
| 0                  |
+--------------------+
2 rows in set. Query took 0.008 seconds.

This is not supported today in user defined aggregates

Describe the solution you'd like

I would like to be be able to create a user defined aggregate that can specify its input order.

This would roughly require:

  1. Extending the AggregateUDFImpl trait to communicate the ordering somehow .
  2. Updating the implementation of https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.AggregateExpr.html#method.order_bys
  3. writing an end to end test in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_aggregates.rs showing it all working

Here are some other places that likely need to changed
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L242-L252

https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L1663-L1690

Maybe looking at how OrderSensitiveArrayAgg is implemented can help https://github.com/apache/arrow-datafusion/blob/5d70c32a9a4accf21e9f27ff5ed62666cbbcbe54/datafusion/physical-expr/src/aggregate/array_agg_ordered.rs#L45

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions