-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance
Description
Is your feature request related to a problem or challenge?
I ran some benchmarks in DataFusion (sort_tpch) and I saw that interleave_views take up a large amount of time for the sorting benchmark (sort_tpch).
It shows up taking roughly 17% of the samples of SortPreservingMergeExec (of 77%, so it's about 25% of the samples).
Looking at the samples, it shows that a lot of time is spent managing a hashmap, rehashing, allocating, etc.

Describe the solution you'd like
We should be able to optimize this. I am not 100% sure what the purpose of the hashmap is here, but we should be able to optimize this to a great extent.
I think we can combine it with the improvements that are done to concat and coalesce @alamb
Describe alternatives you've considered
No response
Additional context
No response
alamb and zhuqi-lucas
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance