Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Aug 14, 2025

Which issue does this PR close?

Use the upstream arrow-rs coalesce kernel, and support LimitedBatchCoalesce for datafusion

Rationale for this change

Use the upstream arrow-rs coalesce kernel, also it will support LimitedBatchCoalesce for datafusion. There are some future work based this, for example Push limit into joins which will optimize join.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Oct 28, 2025
@zhuqi-lucas zhuqi-lucas force-pushed the test_optimize_performance branch from 8dec409 to d303a4d Compare October 28, 2025 05:23
@zhuqi-lucas
Copy link
Contributor Author

Hi @alamb I am reviving this PR since the upstream arrow-rs has be upgraded, thanks!

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

Hi @alamb I am reviving this PR since the upstream arrow-rs has be upgraded, thanks!

Thank you @zhuqi-lucas -- I was just thinking the other day that we (I really) have dropped the ball on the coalesce kernel. I feel like once we get it into DataFusion we can then drive improvements upstream in arrow...

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (d303a4d) to 2d004af diff using: tpch_mem
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 165.36 ms │                 139.21 ms │ +1.19x faster │
│ QQuery 2     │  26.07 ms │                  27.02 ms │     no change │
│ QQuery 3     │  41.53 ms │                  37.80 ms │ +1.10x faster │
│ QQuery 4     │  27.70 ms │                  28.87 ms │     no change │
│ QQuery 5     │  77.59 ms │                  86.36 ms │  1.11x slower │
│ QQuery 6     │  19.51 ms │                  19.09 ms │     no change │
│ QQuery 7     │ 259.48 ms │                 223.90 ms │ +1.16x faster │
│ QQuery 8     │  35.96 ms │                  32.23 ms │ +1.12x faster │
│ QQuery 9     │ 110.90 ms │                  97.19 ms │ +1.14x faster │
│ QQuery 10    │  59.72 ms │                  63.75 ms │  1.07x slower │
│ QQuery 11    │  17.08 ms │                  17.37 ms │     no change │
│ QQuery 12    │  51.09 ms │                  52.85 ms │     no change │
│ QQuery 13    │  47.12 ms │                  46.62 ms │     no change │
│ QQuery 14    │  13.75 ms │                  13.98 ms │     no change │
│ QQuery 15    │  24.58 ms │                  24.44 ms │     no change │
│ QQuery 16    │  24.85 ms │                  25.24 ms │     no change │
│ QQuery 17    │ 148.38 ms │                 150.48 ms │     no change │
│ QQuery 18    │ 325.68 ms │                 268.59 ms │ +1.21x faster │
│ QQuery 19    │  35.89 ms │                  37.68 ms │     no change │
│ QQuery 20    │  48.33 ms │                  48.37 ms │     no change │
│ QQuery 21    │ 343.21 ms │                 321.81 ms │ +1.07x faster │
│ QQuery 22    │  20.77 ms │                  20.99 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1924.55ms │
│ Total Time (test_optimize_performance)   │ 1783.83ms │
│ Average Time (HEAD)                      │   87.48ms │
│ Average Time (test_optimize_performance) │   81.08ms │
│ Queries Faster                           │         7 │
│ Queries Slower                           │         2 │
│ Queries with No Change                   │        13 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (d303a4d) to 2d004af diff using: clickbench_partitioned
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.08 ms │                   2.22 ms │  1.07x slower │
│ QQuery 1     │    51.76 ms │                  49.19 ms │     no change │
│ QQuery 2     │   138.58 ms │                 136.48 ms │     no change │
│ QQuery 3     │   163.87 ms │                 159.43 ms │     no change │
│ QQuery 4     │  1062.09 ms │                1083.48 ms │     no change │
│ QQuery 5     │  1504.52 ms │                1519.72 ms │     no change │
│ QQuery 6     │     2.13 ms │                   2.17 ms │     no change │
│ QQuery 7     │    54.99 ms │                  56.21 ms │     no change │
│ QQuery 8     │  1453.12 ms │                1434.03 ms │     no change │
│ QQuery 9     │  1771.36 ms │                1795.54 ms │     no change │
│ QQuery 10    │   380.31 ms │                 367.38 ms │     no change │
│ QQuery 11    │   437.98 ms │                 420.04 ms │     no change │
│ QQuery 12    │  1389.21 ms │                1356.83 ms │     no change │
│ QQuery 13    │  2154.30 ms │                2102.75 ms │     no change │
│ QQuery 14    │  1278.18 ms │                1265.16 ms │     no change │
│ QQuery 15    │  1199.29 ms │                1255.16 ms │     no change │
│ QQuery 16    │  2745.53 ms │                2673.37 ms │     no change │
│ QQuery 17    │  2686.26 ms │                2671.54 ms │     no change │
│ QQuery 18    │  5441.17 ms │                5012.66 ms │ +1.09x faster │
│ QQuery 19    │   126.65 ms │                 126.00 ms │     no change │
│ QQuery 20    │  2076.29 ms │                2017.20 ms │     no change │
│ QQuery 21    │  2409.47 ms │                2349.88 ms │     no change │
│ QQuery 22    │  4146.20 ms │                3961.03 ms │     no change │
│ QQuery 23    │ 13031.20 ms │               12723.77 ms │     no change │
│ QQuery 24    │   216.08 ms │                 208.33 ms │     no change │
│ QQuery 25    │   495.59 ms │                 462.90 ms │ +1.07x faster │
│ QQuery 26    │   212.81 ms │                 213.01 ms │     no change │
│ QQuery 27    │  2953.33 ms │                2862.80 ms │     no change │
│ QQuery 28    │ 22788.11 ms │               22901.94 ms │     no change │
│ QQuery 29    │  1012.52 ms │                1010.36 ms │     no change │
│ QQuery 30    │  1325.59 ms │                1303.61 ms │     no change │
│ QQuery 31    │  1339.31 ms │                1388.16 ms │     no change │
│ QQuery 32    │  4638.08 ms │                4939.60 ms │  1.07x slower │
│ QQuery 33    │  5956.32 ms │                5823.50 ms │     no change │
│ QQuery 34    │  6161.05 ms │                6107.98 ms │     no change │
│ QQuery 35    │  2023.78 ms │                1995.03 ms │     no change │
│ QQuery 36    │   121.87 ms │                 123.41 ms │     no change │
│ QQuery 37    │    53.46 ms │                  50.81 ms │     no change │
│ QQuery 38    │   122.16 ms │                 124.81 ms │     no change │
│ QQuery 39    │   202.66 ms │                 198.31 ms │     no change │
│ QQuery 40    │    39.94 ms │                  43.28 ms │  1.08x slower │
│ QQuery 41    │    39.29 ms │                  36.95 ms │ +1.06x faster │
│ QQuery 42    │    31.41 ms │                  32.66 ms │     no change │
└──────────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 95439.88ms │
│ Total Time (test_optimize_performance)   │ 94368.69ms │
│ Average Time (HEAD)                      │  2219.53ms │
│ Average Time (test_optimize_performance) │  2194.62ms │
│ Queries Faster                           │          3 │
│ Queries Slower                           │          3 │
│ Queries with No Change                   │         37 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (d303a4d) to 2d004af diff using: clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2723.99 ms │                2724.43 ms │ no change │
│ QQuery 1     │  1252.31 ms │                1279.79 ms │ no change │
│ QQuery 2     │  2472.33 ms │                2436.89 ms │ no change │
│ QQuery 3     │  1192.10 ms │                1194.83 ms │ no change │
│ QQuery 4     │  2230.66 ms │                2304.44 ms │ no change │
│ QQuery 5     │ 27861.61 ms │               28255.38 ms │ no change │
│ QQuery 6     │  4213.00 ms │                4240.26 ms │ no change │
│ QQuery 7     │  3679.97 ms │                3683.50 ms │ no change │
└──────────────┴─────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 45625.96ms │
│ Total Time (test_optimize_performance)   │ 46119.50ms │
│ Average Time (HEAD)                      │  5703.25ms │
│ Average Time (test_optimize_performance) │  5764.94ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          0 │
│ Queries with No Change                   │          8 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘

@zhuqi-lucas
Copy link
Contributor Author

Thank you @alamb for benchmark, the result is good!

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

I agree the benchmarks look good to me (no significant difference in performance)

@zhuqi-lucas would it be ok to update the title of this PR and its description? I think the core change here is to use the upstream 'coalesce' kernel. Is that your understanding too?

@zhuqi-lucas zhuqi-lucas changed the title Testing: Try test optimize performance for coalesce Use the upstream arrow-rs coalesce kernel Oct 29, 2025
@zhuqi-lucas
Copy link
Contributor Author

I agree the benchmarks look good to me (no significant difference in performance)

@zhuqi-lucas would it be ok to update the title of this PR and its description? I think the core change here is to use the upstream 'coalesce' kernel. Is that your understanding too?

Updated the title now, thank you @alamb .

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@zhuqi-lucas zhuqi-lucas added this pull request to the merge queue Oct 31, 2025
Merged via the queue into apache:main with commit b6e5732 Oct 31, 2025
37 checks passed
@zhuqi-lucas
Copy link
Contributor Author

Thank you @alamb @2010YOUY01 for review! Merged now.

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

Awesome -- one step closer

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
## Which issue does this PR close?

Use the upstream arrow-rs coalesce kernel, and support
LimitedBatchCoalesce for datafusion

## Rationale for this change

Use the upstream arrow-rs coalesce kernel, also it will support
LimitedBatchCoalesce for datafusion. There are some future work based
this, for example [Push limit into joins
](apache#18295) which will optimize
join.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants