-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Support binary data types for SortMergeJoin on clause
#17431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jonathanc-n
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stuartcarnie this looks good to me. I'm surprised binary data isnt part of the join fuzz testing, this could be put in a follow up issue
Sounds good. Where is the fuzz testing, as I was looking for a place to write some test SQL to verify at a higher level via some integration tests. |
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn join_fixed_size_binary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should include large binary test as well? 🤔 I noticed the amount of unit tests were getting out of hand, maybe I'll look into separating the tests into the other SMJ files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great if you could do so
The fuzz testing is in Integration tests might be overkill but it'd be nice to see if the binary values are working as we would expect. You could write those in the |
|
Here is the file for slt testing SMJ: https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/sort_merge_join.slt |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stuartcarnie and @jonathanc-n
I added some additional sqllogictests as @jonathanc-n suggested and verified they fail without the code changes in this PR
Details
Completed 1 test files in 0 seconds External error: 4 errors in file /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/sort_merge_join.slt
1. query failed: DataFusion error: This feature is not implemented: Unsupported data type in sort merge join comparator: Binary
[SQL] with t1 as (select arrow_cast(x, 'Binary') as x, id1 from t1),
t2 as (select arrow_cast(y, 'Binary') as y, id2 from t2)
select * from t1 join t2 on t1.x = t2.y order by id1, id2
at /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/sort_merge_join.slt:850
2. query failed: DataFusion error: This feature is not implemented: Unsupported data type in sort merge join comparator: LargeBinary
[SQL] with t1 as (select arrow_cast(x, 'LargeBinary') as x, id1 from t1),
t2 as (select arrow_cast(y, 'LargeBinary') as y, id2 from t2)
select * from t1 join t2 on t1.x = t2.y order by id1, id2
at /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/sort_merge_join.slt:859
3. query failed: DataFusion error: This feature is not implemented: Unsupported data type in sort merge join comparator: BinaryView
[SQL] with t1 as (select arrow_cast(x, 'BinaryView') as x, id1 from t1),
t2 as (select arrow_cast(y, 'BinaryView') as y, id2 from t2)
select * from t1 join t2 on t1.x = t2.y order by id1, id2
at /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/sort_merge_join.slt:868
4. query failed: DataFusion error: This feature is not implemented: Unsupported data type in sort merge join comparator: FixedSizeBinary(2)
[SQL] with t1 as (select arrow_cast(arrow_cast(x, 'Binary'), 'FixedSizeBinary(2)') as x, id1 from t1),
t2 as (select arrow_cast(arrow_cast(y, 'Binary'), 'FixedSizeBinary(2)') as y, id2 from t2)
select * from t1 join t2 on t1.x = t2.y order by id1, id2
at /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/sort_merge_join.slt:877
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn join_fixed_size_binary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great if you could do so
I filed a ticket to track adding support to fuzz testing: |
That's great, thanks @alamb! |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stuartcarnie looks like a nice improvement 💪
…he#17431) * feat: Support binary data types for `SortMergeJoin` `on` clause * Add sql level tests for merge join on binary keys --------- Co-authored-by: Andrew Lamb <[email protected]>
…he#17431) * feat: Support binary data types for `SortMergeJoin` `on` clause * Add sql level tests for merge join on binary keys --------- Co-authored-by: Andrew Lamb <[email protected]>
…he#17431) * feat: Support binary data types for `SortMergeJoin` `on` clause * Add sql level tests for merge join on binary keys --------- Co-authored-by: Andrew Lamb <[email protected]>
…he#17431) * feat: Support binary data types for `SortMergeJoin` `on` clause * Add sql level tests for merge join on binary keys --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
N/A
Rationale for this change
Adds the ability for the
SortMergeJoinphysical node to join on binary types:Binary,FixedSizeBinaryBinaryViewLargeBinaryWhat changes are included in this PR?
ONclauseAre these changes tested?
✅
Are there any user-facing changes?
The documentation does not list the subset of types supported by the
ONclause ofSortMergeJoin.