Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause

### Is your feature request related to a problem or challenge?

Currently, there is only one Aggregation: `GroupedHashAggregateStream`. It does a lovely job, but it allocates memory for every unique `group by` value. 

For large datasets, this can cause OOM errors, even if the very next operation is a `sort by max(x) limit y`.

### Describe the solution you'd like

I would like to add a `GroupedAggregateStream` based on a `PriorityQueue` of grouped values that can be used instead of `GroupedHashAggregateStream` under the specific conditions above, so that Top K queries work even on datasets with cardinality larger than available memory.

### Describe alternatives you've considered

A more generalized implementation where we:

1. sort by group_val
2. aggregate by group_val `emit`ing rows in a stream as the aggregate for each group is computed
3. feed that into a (new) generalized `TopKExec` node that is _only_ responsible for doing the top K operation

Unfortunately, despite being more general, I'm told that this approach will still OOM in our case.

### Additional context

Please see the following similar (but not same) tickets for related top K issues:

1. https://github.com/apache/arrow-datafusion/issues/7149
2. https://github.com/apache/arrow-datafusion/issues/6937
3. https://github.com/apache/arrow-datafusion/issues/7064
4. https://github.com/apache/arrow-datafusion/issues/6899


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory is coupled to group by cardinality, even when the aggregate output is truncated by a limit clause #7191

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191