-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Support null safe equals in ExtractEquijoinPredicate #12458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support null safe equals in ExtractEquijoinPredicate #12458
Conversation
korowa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @eejbyfeldt, this looks like a reasonable change.
I've left some comments, and it's probably also worth to check other logical optimizer rules, to ensure that if they somehow modify joins, it won't result in incorrect operator.
| if join.filter.is_none() { | ||
| return Ok(Transformed::no(LogicalPlan::Join(join))); | ||
| } | ||
| let expr = join.filter.as_ref().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will let-else work here to avoid unwrap?
| eq_op: Operator, | ||
| ) -> Result<(Vec<EquijoinPredicate>, Option<Expr>)> { | ||
| let exprs = split_conjunction_owned(filter); | ||
| let exprs = split_conjunction(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: this part (along with cloning exprs) in elses of this function was changed in #10165 as a part of "reducing expr cloning in optimizer" initiative, so, probably, in this place it's preferred to avoid cloning, unless it's required.
| filter.input.as_ref(), | ||
| LogicalPlan::Join(Join { | ||
| join_type: JoinType::Inner, | ||
| null_equals_null: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule also has it's own logic for identifying join keys in extract_possible_join_keys function. Shouldn't is not distinct from operator be supported there (perhaps by reusing logic from extract_equijoin_predicated_rule)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems like we should probably unify that logic before making this change.
| filter.input.as_ref(), | ||
| LogicalPlan::Join(Join { | ||
| join_type: JoinType::Inner, | ||
| null_equals_null: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, not sure that this rule shouldn't have any interactions with null_equals_null = true joins -- what is the current behavior of such queries and won't this additional filter change it drastically?
|
|
||
| # IS NOT DISTRINCT can be transformed into equijoin | ||
| query TT | ||
| EXPLAIN SELECT t1_id, t1_int, t2_int FROM t1 JOIN t2 ON t1_id IS NOT DISTINCT from t2_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be useful to also have the following test cases
- demonstrating that equality predicates have priority over "is not distinct from" when picking ON conditions from the filter
- how these joins will work with
eliminate_cross_joinrule (which if I'm not mistaken also works for filter over inner joins event with ON conditions) - how these join will be treated by
push_down_filter-- in some cases it seems to be able to push filters into ON conditions -- if so, resulting query won't be correct in case of null_equals_null joins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions for:
how these join will be treated by push_down_filter -- in some cases it seems to be able to push filters into ON conditions -- if so, resulting query won't be correct in case of null_equals_null joins
I thought these would be pushed into the filter and not the on (https://github.com/eejbyfeldt/datafusion/blob/1d4cf53169fd9a611a7e24e9ac4880d3e24b131e/datafusion/expr/src/logical_plan/plan.rs#L2965-L2968) and therefore still be correct.
|
Converted to draft until I have time to address the feedback and look more into if the requires changes in other rules. |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #12299.
Rationale for this change
Allow more joins to be executed efficiently.
What changes are included in this PR?
Previously ExtractEquijoinPredicate did not inspect the value of
nulls_equals_nullfor the join before extracting equi join predicates, which was likely a possible source of bugs. The new code will respectnull_equals_nullif set (and there is a equi join predicate) and if not it will first try to extraOperator::Eqpredicates and if none is found it will try to extractOperator::IsNotDistinctFromequi predicates.Prev
Are these changes tested?
Yes, using existing and some new test cases.
Are there any user-facing changes?
For queries using
IS NOT DISTINCTmight have improved query plans.