-
Notifications
You must be signed in to change notification settings - Fork 1.8k
#17801 Improve nullability reporting of case expressions #17813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pepijnve
wants to merge
28
commits into
apache:main
Choose a base branch
from
pepijnve:issue_17801
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,735
−22
Open
Changes from 18 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
408cee1
#17801 Improve nullability reporting of case expressions
pepijnve 045fc9c
#17801 Clarify logical expression test cases
pepijnve de8b780
#17801 Attempt to clarify const evaluation logic
pepijnve bbd2949
#17801 Extend predicate const evaluation
pepijnve 2075f4b
#17801 Correctly report nullability of implicit casts in predicates
pepijnve 8c87937
#17801 Code formatting
pepijnve e155d41
Merge branch 'main' into issue_17801
alamb 5cfe8b6
Merge branch 'main' into issue_17801
alamb ac4267c
Add comment explaining why the logical plan optimizer is triggered
pepijnve 101db28
Simplify predicate eval code
pepijnve f4c8579
Code formatting
pepijnve 81b6ec1
Add license header
pepijnve b6ebd13
Merge branch 'main' into issue_17801
alamb ebc2d38
Merge branch 'refs/heads/main' into issue_17801
pepijnve 3131899
Try to align logical and physical implementations as much as possible
pepijnve 3da92e5
Allow optimizations to change fields from nullable to not-nullable
pepijnve 0a6b2e7
Correctly handle case-with-expression nullability analysis
pepijnve 113e899
Add unit tests for predicate_eval
pepijnve 9dee1e8
Another attempt to make the code easier to read
pepijnve 4a22dfc
Rework predicate_eval to use set arithmetic
pepijnve a1bc263
Rename predicate_eval to predicate_bounds
pepijnve ac765e9
Add unit tests for NullableInterval::is_certainly_...
pepijnve 51af749
Formatting
pepijnve 4af84a7
Simplify logical and physical case branch filtering logic
pepijnve 427fc30
Further simplification of `is_null`
pepijnve 0223a54
Merge remote-tracking branch 'upstream/HEAD' into issue_17801
pepijnve c5914d6
Update bitflags version declaration to match arrow-schema
pepijnve 4b879e4
Silence "needless pass by value" lint
pepijnve File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -60,6 +60,7 @@ use crate::schema_equivalence::schema_satisfied_by; | |||||||||
| use arrow::array::{builder::StringBuilder, RecordBatch}; | ||||||||||
| use arrow::compute::SortOptions; | ||||||||||
| use arrow::datatypes::Schema; | ||||||||||
| use arrow_schema::Field; | ||||||||||
| use datafusion_catalog::ScanArgs; | ||||||||||
| use datafusion_common::display::ToStringifiedPlan; | ||||||||||
| use datafusion_common::format::ExplainAnalyzeLevel; | ||||||||||
|
|
@@ -2516,7 +2517,9 @@ impl<'a> OptimizationInvariantChecker<'a> { | |||||||||
| previous_schema: Arc<Schema>, | ||||||||||
| ) -> Result<()> { | ||||||||||
| // if the rule is not permitted to change the schema, confirm that it did not change. | ||||||||||
| if self.rule.schema_check() && plan.schema() != previous_schema { | ||||||||||
| if self.rule.schema_check() | ||||||||||
| && !is_allowed_schema_change(previous_schema.as_ref(), plan.schema().as_ref()) | ||||||||||
| { | ||||||||||
| internal_err!("PhysicalOptimizer rule '{}' failed. Schema mismatch. Expected original schema: {:?}, got new schema: {:?}", | ||||||||||
| self.rule.name(), | ||||||||||
| previous_schema, | ||||||||||
|
|
@@ -2532,6 +2535,33 @@ impl<'a> OptimizationInvariantChecker<'a> { | |||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| /// Checks if the change from `old` schema to `new` is allowed or not. | ||||||||||
| /// The current implementation only allows nullability of individual fields to change | ||||||||||
| /// from 'nullable' to 'not nullable'. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think adding rationale about why would be helpful here as it may not be immediately obvious
Suggested change
|
||||||||||
| fn is_allowed_schema_change(old: &Schema, new: &Schema) -> bool { | ||||||||||
| if new.metadata != old.metadata { | ||||||||||
| return false; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| if new.fields.len() != old.fields.len() { | ||||||||||
| return false; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| let new_fields = new.fields.iter().map(|f| f.as_ref()); | ||||||||||
| let old_fields = old.fields.iter().map(|f| f.as_ref()); | ||||||||||
| old_fields | ||||||||||
| .zip(new_fields) | ||||||||||
| .all(|(old, new)| is_allowed_field_change(old, new)) | ||||||||||
| } | ||||||||||
|
|
||||||||||
| fn is_allowed_field_change(old_field: &Field, new_field: &Field) -> bool { | ||||||||||
| new_field.name() == old_field.name() | ||||||||||
| && new_field.data_type() == old_field.data_type() | ||||||||||
| && new_field.metadata() == old_field.metadata() | ||||||||||
| && (new_field.is_nullable() == old_field.is_nullable() | ||||||||||
| || !new_field.is_nullable()) | ||||||||||
| } | ||||||||||
|
|
||||||||||
| impl<'n> TreeNodeVisitor<'n> for OptimizationInvariantChecker<'_> { | ||||||||||
| type Node = Arc<dyn ExecutionPlan>; | ||||||||||
|
|
||||||||||
|
|
||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,6 +341,11 @@ pub fn is_null(expr: Expr) -> Expr { | |
| Expr::IsNull(Box::new(expr)) | ||
| } | ||
|
|
||
| /// Create is not null expression | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a nice drive by cleanup |
||
| pub fn is_not_null(expr: Expr) -> Expr { | ||
| Expr::IsNotNull(Box::new(expr)) | ||
| } | ||
|
|
||
| /// Create is true expression | ||
| pub fn is_true(expr: Expr) -> Expr { | ||
| Expr::IsTrue(Box::new(expr)) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change relaxes the schema check slightly. It now allows individual fields to change from nullable to not-nullable which is ok because it only allow a strict subset of the original schema.
schema_checkhas documentation stating that you should disable the schema check entirely if you want to do this. Seemed better to not have to disable checking entirely.