-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
An unfortunate pattern in the hash join implementation leads to excessive Arc-cloning: Assume the build-side carries a string-view column as a payload. Let N be the number of batches seen on the build side
-
In the build phase, datafusion concatenates the batches on the build side. The string-view column now holds references to at least N data buffers in a vec;
-
When constructing the output batch, the
takeimplementation for string-views clones the data buffer vector of the concatenated build-side column - thus incrementing the references on all N data buffers.
To Reproduce
I noticed this issue when executing and profiling tpch query 18 - roughly 3% of the runtime is spent cloning these Arcs.
Expected behavior
No response
Additional context
-
The concat during build:
datafusion/datafusion/physical-plan/src/joins/hash_join.rs
Lines 1013 to 1015 in 7002a00
// Merge all batches into a single batch, so we can directly index into the arrays let single_batch = concat_batches(&schema, batches_iter)?; -
The take call during batch construction:
compute::take(array.as_ref(), build_indices, None)? -
The relevant bit of arrow-rs
https://github.com/apache/arrow-rs/blob/7e85b48dc8f929afa82f2878b17db7b2df240b8b/arrow-select/src/take.rs#L565-L567