-
Notifications
You must be signed in to change notification settings - Fork 178
Allow renaming group-by fields to existing field names #4586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
| private boolean isInputRef(RexNode node) { | ||
| return switch (node.getKind()) { | ||
| case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { | ||
| final List<RexNode> operands = ((RexCall) node).operands; | ||
| yield isInputRef(operands.getFirst()); | ||
| } | ||
| default -> node instanceof RexInputRef; | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the PlanUtil.getInputRefs be used to replace this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they serve different purposes. PlanUtil.getInputRefs returns all referred input refs. Besides, if a node refers multiple inputs, it will return all of them. Yet here I just want to check whether a node is an input ref (optionally aliased), keeping the node as is.
| // During aggregation, Calcite projects both input dependencies and output group-by fields. | ||
| // When names conflict, Calcite adds numeric suffixes (e.g., "value0"). | ||
| // Apply explicit renaming to restore the intended aliases. | ||
| if (names.size() == reResolved.getLeft().size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the names.size not equals to reResolved.getLeft().size()? seems the condition is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lengths do not equal when a group key is not aliased -- under which circumstance extractAliasLiteral will return empty:
private Optional<RexLiteral> extractAliasLiteral(RexNode node) {
if (node == null) {
return Optional.empty();
} else if (node.getKind() == AS) {
return Optional.of((RexLiteral) ((RexCall) node).getOperands().get(1));
} else {
return Optional.empty();
}Although it seems that all group keys are aliased in practice, this defense check was to prevent unintended future changes to avoid in-correspondent renaming. Should I remove it?
| Pair<List<RexNode>, List<AggCall>> reResolved = | ||
| resolveAttributesForAggregation(groupExprList, aggExprList, context); | ||
|
|
||
| List<String> names = getGroupKeyNamesAfterAggregation(reResolved.getLeft()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename the var names to make it more meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
Signed-off-by: Yuanchun Shen <[email protected]>
| * Imitates {@code Registrar.registerExpression} of {@link RelBuilder} to derive the output order | ||
| * of group-by keys after aggregation. | ||
| * | ||
| * <p>The projected input reference comes first, while any other computed expression follows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Registrar.registerExpression, seems the other computed expression won't promise following the original order if there is expression duplication.
But since our PPL only allow span expr in our group by and it cannot be combined with other span expr. This logic may be right and I cannot find any bad case so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a bad case: stats count() by value, value, @timestamp. I'll fix it.
Update: Fixed by checking duplication
| /** Whether a rex node is an aliased input reference */ | ||
| private boolean isInputRef(RexNode node) { | ||
| return switch (node.getKind()) { | ||
| case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any case that we have DESCENDING, NULLS_FIRST, NULLS_LAST in our stats .. by ... command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't manage to create any. It seems there is always a projection after sorting and before aggregation.
E.g.
LogicalAggregate(group=[{0}], count()=[COUNT()])
LogicalProject(value=[$2])
LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
Signed-off-by: Yuanchun Shen <[email protected]>
* Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <[email protected]> * Add a defense check Signed-off-by: Yuanchun Shen <[email protected]> * Remove defense check Signed-off-by: Yuanchun Shen <[email protected]> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]> (cherry picked from commit a86a5a7) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…names (#4653) * Allow renaming group-by fields to existing field names (#4586) * Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <[email protected]> * Add a defense check Signed-off-by: Yuanchun Shen <[email protected]> * Remove defense check Signed-off-by: Yuanchun Shen <[email protected]> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]> (cherry picked from commit a86a5a7) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Downgrade language level to java 11 Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Yuanchun Shen <[email protected]>
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... # Conflicts: # docs/user/ppl/functions/conversion.rst
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... Signed-off-by: Asif Bashar <[email protected]>
…oject#4586) * Rename fields to intended ones after aggregation Signed-off-by: Yuanchun Shen <[email protected]> * Add a defense check Signed-off-by: Yuanchun Shen <[email protected]> * Remove defense check Signed-off-by: Yuanchun Shen <[email protected]> * Handle cases where there exist duplicated group keys Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]>
Description
This PR fixes a bug in Calcite-enabled PPL queries where group-by fields cannot be aliased to their original field names, causing queries to fail with "field not found" errors.
When Calcite is enabled, PPL queries that use span functions with aliases matching the original field names fail with errors like:
field [value] not found; input fields are: [value0, count()]Affected Query Patterns:
source=time_test | stats count() by span(value, 2000) as valuesource=time_test | stats count() by span(timestamp, 1h) as timestampRoot Cause Analysis
The issue occurs during Calcite's aggregation processing:
Solution Implementation
This PR implements a post-aggregation field renaming strategy that preserves intended aliases.
Related Issues
Resolves #4580
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.