-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Stop copying LogicalPlan and Exprs in OptimizeProjections (2% faster planning)
#10405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3e8264c to
6bbc24d
Compare
| let table_scan = test_table_scan()?; | ||
| let plan = LogicalPlanBuilder::from(table_scan) | ||
| .project(vec![col("a"), lit(0).alias("d")])? | ||
| .project(vec![col("a").add(lit(1)).alias("a"), lit(0).alias("d")])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change the projection is actually merged
This is due to using the general purpose transform_up rather than special casing certain Expr types in the projection rewrite and now CASE is handled
In order to prevent merging the projections, I needed to make the lower projection non trivial so changed it from a to a + 1 as a
Note that the tests for correctness added for #8942 in
#8960 such as consecutive_projection_same_schema are also still passing
| 02)--SubqueryAlias: NUMBERS | ||
| 03)----Projection: Int64(1) AS a, Int64(2) AS b, Int64(3) AS c | ||
| 04)------EmptyRelation | ||
| 01)SubqueryAlias: NUMBERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these plans are better -- they have one less unecessary Projection (I believe the changes are due to better tracking of child rewrites during optimization)
4a7aa33 to
f80b800
Compare
| } | ||
| } | ||
|
|
||
| /// Recursively potentially multiple aliases from an expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trim_expr renamed and with examples
| ) -> Result<Transformed<LogicalPlan>> { | ||
| // Recursively rewrite any nodes that may be able to avoid computation given | ||
| // their parents' required indices. | ||
| match plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of handling plan nodes changed so the ones that rewrite the plan directly do so in the first match statement and those that compute required indices do so in a second match
| /// - `Err(error)`: An error occurred during the optimization process. | ||
| fn optimize_projections( | ||
| plan: &LogicalPlan, | ||
| plan: LogicalPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact this function no takes plan rather than plan is the key change that permits this PR to avoid copying as much
ac5c77e to
43d8f0e
Compare
43d8f0e to
b15d1f4
Compare
OptimizeProjectionsOptimizeProjections (2% faster planning)
| 06)------Projection: nodes.id + Int64(1) AS id | ||
| 07)--------Filter: nodes.id < Int64(10) | ||
| 08)----------TableScan: nodes | ||
| 01)SubqueryAlias: nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in fact it removes another projection layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct
| } | ||
| LogicalPlan::Window(window) => { | ||
| let input_schema = window.input.schema(); | ||
| let input_schema = window.input.schema().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this clone?
| let input_schema = inputs[0].schema(); | ||
| // If inputs are not pruned do not change schema | ||
| // TODO this seems wrong (shouldn't we always use the schema of the input?) | ||
| let schema = if schema.fields().len() == input_schema.fields().len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that shouldn't happen.... we can follow up to run tests with only 1 schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it doesn't make sense -- however, it is the same logic as in with_new_exprs:
datafusion/datafusion/expr/src/logical_plan/plan.rs
Lines 913 to 925 in fad16e7
| LogicalPlan::Union(Union { schema, .. }) => { | |
| let input_schema = inputs[0].schema(); | |
| // If inputs are not pruned do not change schema. | |
| let schema = if schema.fields().len() == input_schema.fields().len() { | |
| schema.clone() | |
| } else { | |
| input_schema.clone() | |
| }; | |
| Ok(LogicalPlan::Union(Union { | |
| inputs: inputs.into_iter().map(Arc::new).collect(), | |
| schema, | |
| })) | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #10442
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb for making tha planner better
|
Thanks for the review @comphead fyi @mustafasrepo |
…r planning) (apache#10405) * Add `LogicalPlan::recompute_schema` for handling rewrite passes * Stop copying LogicalPlan and Exprs in `OptimizeProjections`
Note this also has the changes from #10410 in it
Which issue does this PR close?
Closes #10209
Rationale for this change
Make planning faster by not copying as much
What changes are included in this PR?
OptimizeProjectionsto use treenode APIsAre these changes tested?
Existing tests
Are there any user-facing changes?
CASEexpressions)Benchmark Results show a moderate improvement (2% overall in tpch, but some queries like Q3 are like 10% faster)
Details