-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
Some built in aggregates (such as FIRST_VALUE, LAST_VALUE and ARRAY_AGG) support an optional ORDER BY argument that defines the order they see their input. For example:
❯ create table foo(x int, y int) as values (1, 100),(2, 100),(0, 200);
0 rows in set. Query took 0.003 seconds.
-- note the `ORDER BY x` in the argument to `FIRST_VALUE`
❯ select FIRST_VALUE(x ORDER BY x) from foo GROUP BY y;
+--------------------+
| FIRST_VALUE(foo.x) |
+--------------------+
| 1 |
| 0 |
+--------------------+
2 rows in set. Query took 0.008 seconds.This is not supported today in user defined aggregates
Describe the solution you'd like
I would like to be be able to create a user defined aggregate that can specify its input order.
This would roughly require:
- Extending the
AggregateUDFImpltrait to communicate the ordering somehow . - Updating the implementation of https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.AggregateExpr.html#method.order_bys
- writing an end to end test in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_aggregates.rs showing it all working
Here are some other places that likely need to changed
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L242-L252
Maybe looking at how OrderSensitiveArrayAgg is implemented can help https://github.com/apache/arrow-datafusion/blob/5d70c32a9a4accf21e9f27ff5ed62666cbbcbe54/datafusion/physical-expr/src/aggregate/array_agg_ordered.rs#L45
Describe alternatives you've considered
No response
Additional context
No response