-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Part of #5637
One of the optimizer passes is "common subexpression elimination" that removes redundant computation
However, as @peter-toth noted on #10396 and the CSE code says
datafusion/datafusion/optimizer/src/common_subexpr_eliminate.rs
Lines 108 to 119 in d58bae4
| /// Identifier for each subexpression. | |
| /// | |
| /// Note that the current implementation uses the `Display` of an expression | |
| /// (a `String`) as `Identifier`. | |
| /// | |
| /// An identifier should (ideally) be able to "hash", "accumulate", "equal" and "have no | |
| /// collision (as low as possible)" | |
| /// | |
| /// Since an identifier is likely to be copied many times, it is better that an identifier | |
| /// is small or "copy". otherwise some kinds of reference count is needed. String description | |
| /// here is not such a good choose. | |
| type Identifier = String; |
The way it tracks common subexpressions is with string manipulation is is non ideal for several reasons (including the cost of creating those strings)
Describe the solution you'd like
Revisit the identifiers as using these string identifiers as the keys of ExprStats was not the best choice. Please note this is how CSE has been working since the feature was added initially.
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request