[Parquet] Adaptive Parquet Predicate Pushdown #8733

hhhizzz · 2025-10-28T18:30:05Z

Which issue does this PR close?

Closes [Parquet]Performance Degradation with RowFilter on Unsorted Columns due to Fragmented ReadPlan #8565
Closes Adaptive Parquet Predicate Pushdown Evaluation #5523

Rationale for this change

This change improves the performance of reading Parquet files.

What changes are included in this PR?

This pull request introduces significant improvements to row selection and filtering in the Parquet Arrow reader, optimizing batch reading and handling of sparse data. The most important changes include a new mask-based row selection state, enhancements to synthetic page handling, and expanded test coverage for these features.

Row selection and filtering improvements:

Introduced RowSelectionState in read_plan.rs, which dynamically chooses between a bitmap mask array and selector queue for efficient row selection during batch reads. This enables streaming with contiguous mask segments and reduces overhead for sparse selections.
Updated ParquetRecordBatchReader to leverage the mask-based selection, streaming record batches using boolean masks and applying Arrow filtering for selected rows. This avoids intermediate materialization and improves performance for sparse row selections. If the average length of the RowSelector is less than 8, it will be replaced by a bitmap mask.
If the average RowSelector length is less than 8, it is automatically replaced by a bitmap mask.
Added a benchmark to determine this threshold value (8).

Synthetic page and definition level handling:

A challenge with the mask-based approach is that some pages may be skipped, and due to the streaming design of the reader, it’s not always possible to determine in advance which pages will be skipped.
To address this, additional logic was added to return None when a page is skipped, ensuring correct handling in such cases.
Together, these improvements enhance both efficiency and correctness in row selection, filtering, and sparse data processing for the Parquet Arrow reader.

Are these changes tested?

Added new tests for interleaved skip/select row selections and mask-based sparse row selection, ensuring correctness of the new mask-based streaming logic and synthetic page handling.

Are there any user-facing changes?

No

hhhizzz · 2025-10-28T18:32:13Z

Cargo bench result, added emoji for better looking, 🟢 means not worse, 👍🏻 means 20% more improve.
I noticed in some arrow_reader_row_filter, the perf downgrade 5%, continue investigating.

group	main	rowselectionempty
arrow_reader_clickbench/async/Q1	1.00 🟢 1472.6±2.89µs ? ?/sec	1.00 🟢 1475.1±3.83µs ? ?/sec
arrow_reader_clickbench/async/Q10	1.02 8.6±0.05ms ? ?/sec	1.00 🟢 8.5±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q11	1.01 9.9±0.10ms ? ?/sec	1.00 🟢 9.8±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q12	1.06 18.6±0.08ms ? ?/sec	1.00 🟢 17.4±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q13	1.30 27.2±0.31ms ? ?/sec	1.00 🟢👍🏻 20.9±0.12ms ? ?/sec
arrow_reader_clickbench/async/Q14	1.38 26.2±0.05ms ? ?/sec	1.00 🟢👍🏻 19.0±0.03ms ? ?/sec
arrow_reader_clickbench/async/Q19	1.01 3.8±0.01ms ? ?/sec	1.00 🟢 3.8±0.05ms ? ?/sec
arrow_reader_clickbench/async/Q20	1.26 96.6±15.76ms ? ?/sec	1.00 🟢👍🏻 76.9±0.38ms ? ?/sec
arrow_reader_clickbench/async/Q21	1.31 117.5±19.40ms ? ?/sec	1.00 🟢👍🏻 89.5±0.10ms ? ?/sec
arrow_reader_clickbench/async/Q22	1.00 🟢 160.6±2.96ms ? ?/sec	1.02 164.3±2.78ms ? ?/sec
arrow_reader_clickbench/async/Q23	1.00 🟢 290.1±3.98ms ? ?/sec	1.00 🟢 290.7±3.78ms ? ?/sec
arrow_reader_clickbench/async/Q24	1.31 29.8±0.07ms ? ?/sec	1.00 🟢👍🏻 22.8±0.09ms ? ?/sec
arrow_reader_clickbench/async/Q27	1.00 🟢 65.2±0.26ms ? ?/sec	1.01 65.9±0.58ms ? ?/sec
arrow_reader_clickbench/async/Q28	1.01 67.2±0.31ms ? ?/sec	1.00 🟢 66.5±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q30	1.83 40.5±0.13ms ? ?/sec	1.00 🟢👍🏻 22.2±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q36	1.08 81.9±1.02ms ? ?/sec	1.00 🟢 75.9±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q37	1.10 65.5±0.07ms ? ?/sec	1.00 🟢 59.4±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q38	1.00 🟢 25.5±0.05ms ? ?/sec	1.03 26.2±0.06ms ? ?/sec
arrow_reader_clickbench/async/Q39	1.00 🟢 32.0±0.03ms ? ?/sec	1.00 🟢 32.1±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q40	1.66 33.7±0.14ms ? ?/sec	1.00 🟢👍🏻 20.3±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q41	1.53 25.7±0.05ms ? ?/sec	1.00 🟢👍🏻 16.8±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q42	1.15 9.8±0.04ms ? ?/sec	1.00 🟢 8.5±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q1	1.00 🟢 1381.6±15.65µs ? ?/sec	1.02 1410.1±12.14µs ? ?/sec
arrow_reader_clickbench/sync/Q10	1.03 6.9±0.05ms ? ?/sec	1.00 🟢 6.7±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q11	1.00 🟢 8.2±0.05ms ? ?/sec	1.00 🟢 8.2±0.04ms ? ?/sec
arrow_reader_clickbench/sync/Q12	1.06 29.9±0.04ms ? ?/sec	1.00 🟢 28.2±0.67ms ? ?/sec
arrow_reader_clickbench/sync/Q13	1.09 38.6±0.06ms ? ?/sec	1.00 🟢 35.6±1.45ms ? ?/sec
arrow_reader_clickbench/sync/Q14	1.13 37.4±0.05ms ? ?/sec	1.00 🟢 33.0±0.05ms ? ?/sec
arrow_reader_clickbench/sync/Q19	1.00 🟢 3.2±0.01ms ? ?/sec	1.00 🟢 3.2±0.01ms ? ?/sec
arrow_reader_clickbench/sync/Q20	1.04 129.3±0.26ms ? ?/sec	1.00 🟢 124.0±0.20ms ? ?/sec
arrow_reader_clickbench/sync/Q21	1.05 177.1±0.30ms ? ?/sec	1.00 🟢 169.0±1.08ms ? ?/sec
arrow_reader_clickbench/sync/Q22	1.05 360.6±9.48ms ? ?/sec	1.00 🟢 342.6±2.92ms ? ?/sec
arrow_reader_clickbench/sync/Q23	1.04 314.8±15.28ms ? ?/sec	1.00 🟢 303.1±12.31ms ? ?/sec
arrow_reader_clickbench/sync/Q24	1.17 35.6±0.17ms ? ?/sec	1.00 🟢 30.4±0.06ms ? ?/sec
arrow_reader_clickbench/sync/Q27	1.00 🟢 100.3±0.20ms ? ?/sec	1.00 🟢 100.6±0.12ms ? ?/sec
arrow_reader_clickbench/sync/Q28	1.00 🟢 101.0±0.15ms ? ?/sec	1.00 🟢 101.0±0.52ms ? ?/sec
arrow_reader_clickbench/sync/Q30	1.86 38.8±0.07ms ? ?/sec	1.00 🟢👍🏻 20.9±0.05ms ? ?/sec
arrow_reader_clickbench/sync/Q36	1.06 103.2±0.12ms ? ?/sec	1.00 🟢 97.1±0.14ms ? ?/sec
arrow_reader_clickbench/sync/Q37	1.09 60.7±0.15ms ? ?/sec	1.00 🟢 55.7±0.12ms ? ?/sec
arrow_reader_clickbench/sync/Q38	1.00 🟢 20.4±0.03ms ? ?/sec	1.03 21.1±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q39	1.00 🟢 22.7±0.03ms ? ?/sec	1.00 🟢 22.7±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q40	1.74 32.1±0.05ms ? ?/sec	1.00 🟢👍🏻 18.4±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q41	1.59 23.9±0.04ms ? ?/sec	1.00 🟢👍🏻 15.0±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q42	1.16 9.2±0.03ms ? ?/sec	1.00 🟢 7.9±0.02ms ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/async	1.00 🟢 1157.2±2.57µs ? ?/sec	1.03 1188.1±17.66µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync	1.00 🟢 1260.0±3.78µs ? ?/sec	1.08 1359.2±1.95µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async	1.00 🟢 1062.5±4.63µs ? ?/sec	1.08 1148.4±2.43µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync	1.00 🟢 1090.9±1.82µs ? ?/sec	1.07 1172.1±1.76µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async	1.00 🟢 978.4±1.26µs ? ?/sec	1.05 1023.3±1.68µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync	1.00 🟢 1150.7±2.87µs ? ?/sec	1.05 1204.9±2.07µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async	1.00 🟢 904.2±2.18µs ? ?/sec	1.05 946.6±0.78µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync	1.00 🟢 955.7±0.64µs ? ?/sec	1.05 1001.9±2.72µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async	1.00 🟢 1160.4±1.68µs ? ?/sec	1.06 1234.1±1.85µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync	1.00 🟢 1263.1±2.77µs ? ?/sec	1.07 1348.1±1.45µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async	1.00 🟢 1065.7±1.90µs ? ?/sec	1.06 1134.2±1.56µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync	1.00 🟢 1092.5±1.50µs ? ?/sec	1.07 1168.0±1.42µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async	1.00 🟢 592.4±1.43µs ? ?/sec	1.00 🟢 592.9±1.69µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync	1.01 626.4±0.84µs ? ?/sec	1.00 🟢 622.9±0.70µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async	1.01 573.7±0.49µs ? ?/sec	1.00 🟢 570.1±0.65µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync	1.00 🟢 590.3±1.04µs ? ?/sec	1.05 617.4±0.66µs ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async	1.05 3.2±0.01ms ? ?/sec	1.00 🟢 3.0±0.00ms ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync	1.08 3.1±0.01ms ? ?/sec	1.00 🟢 2.9±0.00ms ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async	1.62 2.7±0.01ms ? ?/sec	1.00 🟢👍🏻 1684.8±2.62µs ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync	1.66 2.6±0.01ms ? ?/sec	1.00 🟢👍🏻 1557.3±4.54µs ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async	1.00 🟢 1298.3±4.07µs ? ?/sec	1.06 1373.2±3.09µs ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync	1.00 🟢 1472.5±6.34µs ? ?/sec	1.06 1567.0±2.61µs ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async	1.00 🟢 1217.0±3.13µs ? ?/sec	1.06 1294.2±3.22µs ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync	1.00 🟢 1276.0±10.65µs ? ?/sec	1.03 1318.2±37.69µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async	1.00 🟢 791.7±1.24µs ? ?/sec	1.06 836.0±0.99µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync	1.00 🟢 893.1±1.48µs ? ?/sec	1.06 946.5±0.86µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async	1.00 🟢 744.6±0.88µs ? ?/sec	1.05 782.7±0.67µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync	1.00 🟢 794.7±3.42µs ? ?/sec	1.06 839.4±1.29µs ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async	1.54 3.1±0.01ms ? ?/sec	1.00 🟢👍🏻 2.0±0.00ms ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync	1.51 3.7±0.01ms ? ?/sec	1.00 🟢👍🏻 2.4±0.00ms ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async	1.45 2.7±0.01ms ? ?/sec	1.00 🟢👍🏻 1834.8±3.39µs ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync	1.41 2.6±0.02ms ? ?/sec	1.00 🟢👍🏻 1825.7±3.46µs ? ?/sec

alamb · 2025-10-29T21:14:04Z

😮 thank you @hhhizzz -- I plan to review this PR carefully, but it will likely take me a few days

alamb · 2025-10-29T21:14:12Z

fyi @zhuqi-lucas and @XiangpengHao

alamb · 2025-10-30T09:29:07Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

zhuqi-lucas · 2025-10-30T10:20:42Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    fn new(selectors: Vec<RowSelector>) -> Self {
+        let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
+        let selector_count = selectors.len();
+        const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 8;


@alamb It looks like similar to my original implementation which is fixed for choice.

But it's more reasonable for this PR:

Added a benchmark to determine this threshold value (8).

I added bench in the code, you can also try on your machine. I find it varies heavily on different platform, on my Mac, it's 8, but on my x86 PC, the value can be set to around 30.

Nice @hhhizzz , i am wandering if we can change to more stable choice, such as statistic based choice, but it's a good start for this PR.

That's great idea, let me do some more investigation on different platform and put the result here.

alamb · 2025-10-30T10:49:21Z

🤖: Benchmark completed

Details

group                                                                                                      main                                   rowselectionempty
-----                                                                                                      ----                                   -----------------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1099.8±2.52µs        ? ?/sec    1.16   1274.8±3.26µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1269.3±2.75µs        ? ?/sec    1.03   1307.9±4.94µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1107.4±3.98µs        ? ?/sec    1.16   1281.4±2.80µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.05    513.1±6.17µs        ? ?/sec    1.00    486.6±3.42µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.00    660.4±3.25µs        ? ?/sec    1.02    673.5±6.14µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.03    506.2±2.98µs        ? ?/sec    1.00    493.5±3.45µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.04    572.5±2.07µs        ? ?/sec    1.00    552.8±2.47µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    725.3±4.38µs        ? ?/sec    1.01    731.3±3.60µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.03    585.1±3.07µs        ? ?/sec    1.00    565.3±3.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    238.9±3.00µs        ? ?/sec    1.14    271.5±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    214.6±0.58µs        ? ?/sec    1.24    266.2±1.12µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    235.8±2.71µs        ? ?/sec    1.18    278.6±3.64µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.21    354.3±2.88µs        ? ?/sec    1.00    292.4±4.95µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.16    326.1±0.58µs        ? ?/sec    1.00    282.3±1.53µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.04    291.0±2.36µs        ? ?/sec    1.00    279.3±1.39µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.20    359.2±5.50µs        ? ?/sec    1.00    299.9±2.58µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    980.1±3.50µs        ? ?/sec    1.14   1122.1±9.70µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    847.1±2.21µs        ? ?/sec    1.14    966.6±2.17µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    985.8±2.30µs        ? ?/sec    1.15   1133.0±3.28µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    306.5±4.21µs        ? ?/sec    1.46    446.5±4.50µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    478.8±1.42µs        ? ?/sec    1.32    633.6±7.02µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    311.6±5.15µs        ? ?/sec    1.46    456.1±3.48µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    161.2±0.81µs        ? ?/sec    1.26    202.8±0.55µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.00    303.1±0.74µs        ? ?/sec    1.13    343.6±0.47µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    166.7±0.32µs        ? ?/sec    1.25    208.1±0.38µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.7±0.31µs        ? ?/sec    1.52    118.2±0.36µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.00    260.3±0.57µs        ? ?/sec    1.16    300.7±0.87µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     84.1±0.30µs        ? ?/sec    1.46    123.2±0.17µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    738.5±1.43µs        ? ?/sec    1.00    737.3±2.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    580.0±2.20µs        ? ?/sec    1.02    591.8±2.08µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    743.9±2.69µs        ? ?/sec    1.00    743.2±2.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     64.7±4.61µs        ? ?/sec    1.01     65.5±5.32µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    245.5±1.68µs        ? ?/sec    1.03    252.8±1.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.00     73.1±6.90µs        ? ?/sec    1.03     75.4±1.73µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     94.5±0.22µs        ? ?/sec    1.00     94.6±0.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    235.9±0.94µs        ? ?/sec    1.00    235.0±1.30µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     99.7±0.48µs        ? ?/sec    1.01    100.2±1.65µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.05      9.8±0.13µs        ? ?/sec    1.00      9.3±0.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.00    193.5±0.31µs        ? ?/sec    1.00    192.7±0.72µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.04     15.1±0.28µs        ? ?/sec    1.00     14.4±0.14µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    184.5±0.56µs        ? ?/sec    1.00    185.1±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    346.8±0.82µs        ? ?/sec    1.00    348.1±2.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    190.0±0.52µs        ? ?/sec    1.00    190.9±0.99µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.02     14.7±0.32µs        ? ?/sec    1.00     14.4±0.31µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.00    262.2±0.84µs        ? ?/sec    1.00    262.6±1.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.1±0.58µs        ? ?/sec    1.02     20.5±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    364.7±1.81µs        ? ?/sec    1.01    367.8±1.49µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    383.5±1.16µs        ? ?/sec    1.01    388.4±1.63µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    372.1±0.72µs        ? ?/sec    1.01    374.9±1.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     26.4±0.30µs        ? ?/sec    1.05     27.7±0.53µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.00    215.4±0.94µs        ? ?/sec    1.01    218.2±1.01µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     30.6±0.46µs        ? ?/sec    1.16     35.5±0.52µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    124.8±0.51µs        ? ?/sec    1.00    124.3±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.12    140.0±0.78µs        ? ?/sec    1.00    125.1±1.22µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    127.4±0.77µs        ? ?/sec    1.00    127.4±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    178.7±0.42µs        ? ?/sec    1.00    178.3±1.56µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.14    236.1±1.84µs        ? ?/sec    1.00    207.6±1.98µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    183.6±0.37µs        ? ?/sec    1.00    183.9±1.78µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01     76.3±0.21µs        ? ?/sec    1.00     75.5±0.25µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.15    181.2±0.83µs        ? ?/sec    1.00    157.3±2.15µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     83.0±0.37µs        ? ?/sec    1.00     82.1±0.18µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    135.3±0.37µs        ? ?/sec    1.06    143.3±2.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.11    214.0±0.96µs        ? ?/sec    1.00    192.2±2.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    141.2±0.32µs        ? ?/sec    1.05    148.7±0.34µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.7±0.33µs        ? ?/sec    1.00     74.6±0.31µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.16    177.7±0.61µs        ? ?/sec    1.00    153.6±0.51µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.01     78.6±0.27µs        ? ?/sec    1.00     77.9±0.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    111.9±0.23µs        ? ?/sec    1.02    114.5±0.26µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    124.4±0.42µs        ? ?/sec    1.07    133.7±0.60µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.02    117.7±2.19µs        ? ?/sec    1.00    115.7±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    169.4±0.63µs        ? ?/sec    1.01    170.7±0.31µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    211.4±1.55µs        ? ?/sec    1.13    239.2±0.76µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    175.8±1.56µs        ? ?/sec    1.00    175.9±0.33µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    202.0±0.31µs        ? ?/sec    1.00    201.2±0.43µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    225.6±0.64µs        ? ?/sec    1.12    252.9±3.10µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.6±1.41µs        ? ?/sec    1.00    208.3±0.65µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    144.6±0.53µs        ? ?/sec    1.07    154.4±4.00µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    193.6±1.10µs        ? ?/sec    1.15    221.9±0.54µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    148.7±0.40µs        ? ?/sec    1.05    155.5±1.67µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00    107.7±0.78µs        ? ?/sec    1.01    108.3±2.06µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    172.1±1.13µs        ? ?/sec    1.16    199.4±1.08µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    115.7±1.55µs        ? ?/sec    1.00    115.4±1.66µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.01    102.1±0.21µs        ? ?/sec    1.00    101.1±0.22µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.14    117.8±0.30µs        ? ?/sec    1.00    103.5±0.31µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.01    105.6±0.24µs        ? ?/sec    1.00    104.2±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.01    139.5±0.29µs        ? ?/sec    1.00    137.4±1.31µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.15    194.3±0.71µs        ? ?/sec    1.00    168.5±2.16µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.02    144.8±0.33µs        ? ?/sec    1.00    142.0±1.28µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     42.5±0.26µs        ? ?/sec    1.04     44.2±0.32µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.21    143.2±0.50µs        ? ?/sec    1.00    118.0±0.46µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     47.7±0.10µs        ? ?/sec    1.03     49.1±0.21µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.00    102.7±0.31µs        ? ?/sec    1.07    110.3±0.30µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.14    177.3±0.34µs        ? ?/sec    1.00    155.5±0.82µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    108.2±0.24µs        ? ?/sec    1.07    115.8±0.87µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.01     38.5±0.12µs        ? ?/sec    1.00     38.2±0.14µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.22    142.1±0.35µs        ? ?/sec    1.00    116.0±0.41µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     44.0±0.18µs        ? ?/sec    1.00     43.7±0.13µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.01     98.5±0.30µs        ? ?/sec    1.00     97.3±0.17µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.15    111.4±0.31µs        ? ?/sec    1.00     96.6±0.70µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.01    101.3±0.19µs        ? ?/sec    1.00    100.7±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.02    128.6±0.19µs        ? ?/sec    1.00    126.3±0.42µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.16    175.9±1.18µs        ? ?/sec    1.00    151.9±0.43µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    130.9±0.42µs        ? ?/sec    1.00    131.3±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.8±0.31µs        ? ?/sec    1.00     24.5±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.25    127.1±0.73µs        ? ?/sec    1.00    101.7±0.59µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.04     31.1±0.27µs        ? ?/sec    1.00     30.0±0.44µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     83.6±0.33µs        ? ?/sec    1.09     91.2±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.13    156.0±0.29µs        ? ?/sec    1.00    137.5±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     89.1±0.20µs        ? ?/sec    1.09     97.0±0.65µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.02     18.0±0.43µs        ? ?/sec    1.00     17.7±0.51µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.27    122.7±0.52µs        ? ?/sec    1.00     96.2±0.21µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     25.9±0.45µs        ? ?/sec    1.00     25.6±0.87µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     85.7±0.23µs        ? ?/sec    1.00     86.0±0.27µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00     93.8±0.90µs        ? ?/sec    1.13    106.0±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     88.8±0.86µs        ? ?/sec    1.00     89.1±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    117.5±1.43µs        ? ?/sec    1.00    117.2±0.44µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    149.7±1.76µs        ? ?/sec    1.17    174.8±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.02    122.7±0.63µs        ? ?/sec    1.00    120.5±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    149.5±0.77µs        ? ?/sec    1.00    148.5±0.50µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.00    170.3±0.59µs        ? ?/sec    1.15    196.4±0.60µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    154.6±0.44µs        ? ?/sec    1.00    154.5±0.89µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     90.7±0.39µs        ? ?/sec    1.09     98.7±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.00    138.0±1.48µs        ? ?/sec    1.21    167.0±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     97.0±0.69µs        ? ?/sec    1.07    104.1±0.48µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     43.0±0.70µs        ? ?/sec    1.05     45.2±2.20µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    112.4±1.30µs        ? ?/sec    1.22    137.3±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     49.1±0.66µs        ? ?/sec    1.05     51.5±2.74µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.02     98.9±0.24µs        ? ?/sec    1.00     96.9±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.15    114.0±0.17µs        ? ?/sec    1.00     99.3±0.18µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.02    102.1±0.19µs        ? ?/sec    1.00     99.9±0.40µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.01    130.5±0.72µs        ? ?/sec    1.00    128.9±0.36µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.16    184.9±0.64µs        ? ?/sec    1.00    159.8±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    135.1±1.03µs        ? ?/sec    1.00    134.8±0.57µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.06     36.6±0.13µs        ? ?/sec    1.00     34.3±0.08µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.21    137.3±0.65µs        ? ?/sec    1.00    113.1±0.30µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     41.6±0.11µs        ? ?/sec    1.00     41.6±0.10µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     95.3±0.29µs        ? ?/sec    1.07    102.4±0.89µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.14    170.2±0.84µs        ? ?/sec    1.00    148.7±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00    101.1±0.28µs        ? ?/sec    1.07    107.8±0.28µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.5±0.13µs        ? ?/sec    1.01     30.7±0.17µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.23    134.3±0.75µs        ? ?/sec    1.00    109.6±0.50µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     36.1±0.13µs        ? ?/sec    1.00     36.1±0.18µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.0±0.02ms        ? ?/sec    1.02      7.1±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     12.8±0.09ms        ? ?/sec    1.04     13.3±0.18ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.05    506.2±3.03µs        ? ?/sec    1.00    483.7±4.49µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    657.7±2.06µs        ? ?/sec    1.03    676.4±6.01µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.01    505.5±4.10µs        ? ?/sec    1.00    498.2±2.68µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.07    735.7±5.14µs        ? ?/sec    1.00    686.6±3.75µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.02    800.6±3.27µs        ? ?/sec    1.00    784.2±3.19µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.06    741.7±4.61µs        ? ?/sec    1.00    699.1±3.62µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.02    301.0±1.47µs        ? ?/sec    1.00    296.4±1.47µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.07    385.6±6.15µs        ? ?/sec    1.00    358.7±5.47µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.01    306.8±4.70µs        ? ?/sec    1.00    302.4±1.34µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    229.3±4.89µs        ? ?/sec    1.20    274.1±2.95µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    214.2±0.62µs        ? ?/sec    1.24    266.2±1.71µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.3±2.41µs        ? ?/sec    1.20    279.1±2.72µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.00    462.7±6.19µs        ? ?/sec    1.04    479.3±1.83µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.00    337.0±1.23µs        ? ?/sec    1.10    371.2±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.00    470.2±2.72µs        ? ?/sec    1.04    488.4±1.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    109.6±2.85µs        ? ?/sec    1.07    116.8±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.09    121.1±0.33µs        ? ?/sec    1.00    111.2±0.32µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    112.6±0.38µs        ? ?/sec    1.06    119.8±0.53µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    143.9±0.74µs        ? ?/sec    1.09    156.3±0.51µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.12    198.0±2.59µs        ? ?/sec    1.00    177.4±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    148.1±0.41µs        ? ?/sec    1.09    161.7±0.49µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     42.5±0.39µs        ? ?/sec    1.04     44.1±0.20µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.23    146.9±2.60µs        ? ?/sec    1.00    119.2±0.27µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     47.6±0.14µs        ? ?/sec    1.03     49.0±0.13µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    103.0±0.21µs        ? ?/sec    1.07    110.1±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.14    176.9±0.47µs        ? ?/sec    1.00    154.6±0.30µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    107.9±0.35µs        ? ?/sec    1.07    115.7±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.01     38.6±0.17µs        ? ?/sec    1.00     38.2±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.22    141.8±0.45µs        ? ?/sec    1.00    116.3±0.36µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.8±0.12µs        ? ?/sec    1.00     43.8±0.14µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.02    100.9±0.22µs        ? ?/sec    1.00     98.5±0.32µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.15    112.6±0.34µs        ? ?/sec    1.00     97.5±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.03    104.4±0.26µs        ? ?/sec    1.00    101.4±0.34µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.01    128.7±0.36µs        ? ?/sec    1.00    127.4±0.30µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.19    181.8±0.85µs        ? ?/sec    1.00    152.7±1.58µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.01    133.8±0.46µs        ? ?/sec    1.00    132.7±1.95µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.08     27.0±0.40µs        ? ?/sec    1.00     25.0±0.49µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.27    125.3±0.33µs        ? ?/sec    1.00     98.6±0.30µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.06     31.8±0.35µs        ? ?/sec    1.00     30.1±0.40µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.00     86.3±0.39µs        ? ?/sec    1.07     92.7±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.16    159.3±0.86µs        ? ?/sec    1.00    136.9±0.42µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.00     91.0±0.38µs        ? ?/sec    1.08     98.2±0.35µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.00     21.0±0.58µs        ? ?/sec    1.01     21.2±0.69µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.27    123.4±0.41µs        ? ?/sec    1.00     96.9±0.44µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     26.8±0.83µs        ? ?/sec    1.02     27.2±0.92µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.00     85.8±0.28µs        ? ?/sec    1.00     86.1±0.25µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00     93.8±0.22µs        ? ?/sec    1.13    105.9±0.35µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     89.0±0.32µs        ? ?/sec    1.00     88.7±0.27µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    116.6±0.51µs        ? ?/sec    1.02    118.9±0.48µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    158.1±0.82µs        ? ?/sec    1.15    181.9±0.65µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    122.8±0.53µs        ? ?/sec    1.00    123.2±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.01    149.6±0.69µs        ? ?/sec    1.00    148.1±1.10µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.00    169.8±1.23µs        ? ?/sec    1.15    195.9±1.84µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.01    155.7±0.46µs        ? ?/sec    1.00    153.8±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     91.4±0.37µs        ? ?/sec    1.07     98.0±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.00    138.7±1.24µs        ? ?/sec    1.20    166.6±0.54µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     96.7±0.42µs        ? ?/sec    1.07    103.8±0.87µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.00     41.6±0.89µs        ? ?/sec    1.11     46.4±1.77µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.00    113.8±0.33µs        ? ?/sec    1.22    138.6±0.71µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.00     48.4±0.62µs        ? ?/sec    1.12     54.4±2.27µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.04    106.1±0.91µs        ? ?/sec    1.00    102.3±0.23µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.15    118.0±1.14µs        ? ?/sec    1.00    102.2±0.32µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.04    109.4±0.92µs        ? ?/sec    1.00    105.3±0.27µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.01    137.6±1.41µs        ? ?/sec    1.00    136.7±0.36µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.16    189.3±1.94µs        ? ?/sec    1.00    163.6±0.32µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.00    142.7±1.37µs        ? ?/sec    1.00    142.2±1.46µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     36.5±0.31µs        ? ?/sec    1.00     36.3±0.11µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.22    135.2±0.37µs        ? ?/sec    1.00    111.1±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     41.3±0.32µs        ? ?/sec    1.00     41.3±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     95.2±0.83µs        ? ?/sec    1.07    101.8±0.25µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.15    169.5±0.50µs        ? ?/sec    1.00    147.0±0.40µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00    100.7±0.21µs        ? ?/sec    1.07    107.8±0.47µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.02     30.8±0.10µs        ? ?/sec    1.00     30.2±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.23    134.4±0.41µs        ? ?/sec    1.00    109.0±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     36.2±0.12µs        ? ?/sec    1.00     36.3±0.16µs        ? ?/sec

alamb · 2025-10-30T10:49:26Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

alamb · 2025-10-30T11:15:07Z

🤖: Benchmark completed

Details

group                                main                                   rowselectionempty
-----                                ----                                   -----------------
arrow_reader_clickbench/async/Q1     1.00      2.3±0.01ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.2±0.46ms        ? ?/sec    1.02     13.5±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.8±0.28ms        ? ?/sec    1.04     15.4±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     28.0±0.25ms        ? ?/sec    1.00     27.4±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.19     39.1±0.24ms        ? ?/sec    1.00     32.8±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.22     37.0±0.30ms        ? ?/sec    1.00     30.3±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.6±0.10ms        ? ?/sec    1.00      5.6±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00   133.9±14.45ms        ? ?/sec    1.24   165.7±11.71ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00   153.7±15.94ms        ? ?/sec    1.22   187.2±19.78ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00   276.6±19.55ms        ? ?/sec    1.18   327.1±32.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    433.1±8.95ms        ? ?/sec    1.00    433.2±2.25ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.19     44.0±0.51ms        ? ?/sec    1.00     36.9±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    105.0±0.78ms        ? ?/sec    1.02    107.0±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    105.2±0.54ms        ? ?/sec    1.03    108.1±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.63     53.8±0.43ms        ? ?/sec    1.00     33.1±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.02    125.1±0.45ms        ? ?/sec    1.00    122.3±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.04     98.8±0.57ms        ? ?/sec    1.00     95.3±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     37.2±0.28ms        ? ?/sec    1.05     39.0±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     48.2±0.33ms        ? ?/sec    1.05     50.4±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.45     45.7±1.20ms        ? ?/sec    1.00     31.5±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.39     36.4±0.68ms        ? ?/sec    1.00     26.1±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.12     13.7±0.17ms        ? ?/sec    1.00     12.3±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.01      9.5±0.10ms        ? ?/sec    1.00      9.4±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.0±0.12ms        ? ?/sec    1.01     11.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.06     38.8±0.27ms        ? ?/sec    1.00     36.7±2.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.02     49.9±0.29ms        ? ?/sec    1.00     48.7±0.47ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.08     47.8±0.27ms        ? ?/sec    1.00     44.4±2.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.01      4.3±0.03ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    178.4±0.91ms        ? ?/sec    1.01    181.0±0.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.01    242.7±1.06ms        ? ?/sec    1.00    239.6±1.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    484.3±3.98ms        ? ?/sec    1.01    489.9±3.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   440.8±15.88ms        ? ?/sec    1.00   441.8±14.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.10     51.6±0.80ms        ? ?/sec    1.00     46.8±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    155.5±0.99ms        ? ?/sec    1.03    159.6±0.98ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    151.5±0.72ms        ? ?/sec    1.03    156.2±1.49ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.66     52.1±0.36ms        ? ?/sec    1.00     31.4±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    155.1±1.31ms        ? ?/sec    1.00    154.8±1.54ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.06     90.0±0.37ms        ? ?/sec    1.00     85.1±0.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     30.0±0.17ms        ? ?/sec    1.01     30.3±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.7±0.35ms        ? ?/sec    1.02     35.2±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.64     44.1±0.40ms        ? ?/sec    1.00     26.9±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.47     33.3±0.34ms        ? ?/sec    1.00     22.6±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.15     12.8±0.14ms        ? ?/sec    1.00     11.1±0.11ms        ? ?/sec

alamb · 2025-10-30T11:15:11Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

alamb · 2025-10-30T11:28:43Z

🤖: Benchmark completed

Details

group                                                                                main                                   rowselectionempty
-----                                                                                ----                                   -----------------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1720.5±11.69µs        ? ?/sec    1.01  1739.0±13.11µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.00      2.0±0.02ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00   1557.1±7.68µs        ? ?/sec    1.02   1586.2±7.77µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.00   1656.7±8.38µs        ? ?/sec    1.01  1672.2±14.29µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00   1519.3±8.50µs        ? ?/sec    1.00  1524.9±11.77µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.00  1870.1±14.00µs        ? ?/sec    1.00  1860.9±12.07µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00   1350.6±4.84µs        ? ?/sec    1.00  1356.5±10.51µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.00   1450.9±6.97µs        ? ?/sec    1.01  1467.6±12.90µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00   1693.8±7.84µs        ? ?/sec    1.03  1745.4±12.91µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.00  1976.9±14.72µs        ? ?/sec    1.01      2.0±0.02ms        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00   1550.6±5.40µs        ? ?/sec    1.02  1583.6±10.16µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.00  1634.7±18.11µs        ? ?/sec    1.02  1663.9±11.89µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00    935.4±4.28µs        ? ?/sec    1.01    945.1±5.97µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.00    988.9±5.01µs        ? ?/sec    1.00    993.7±9.68µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00    865.3±3.69µs        ? ?/sec    1.01    870.3±9.16µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.01    984.7±4.82µs        ? ?/sec    1.00    978.6±6.66µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.47      4.1±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.57      4.1±0.01ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.34      3.6±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.43      3.5±0.01ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.00   1917.4±8.96µs        ? ?/sec    1.01  1945.3±16.00µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00   1752.7±8.89µs        ? ?/sec    1.00  1756.5±10.41µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.00   1885.0±8.43µs        ? ?/sec    1.00  1888.4±14.49µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.01   1254.9±4.74µs        ? ?/sec    1.00  1243.6±11.57µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.00   1383.5±5.02µs        ? ?/sec    1.00  1387.9±10.40µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1134.1±10.68µs        ? ?/sec    1.00   1134.0±7.09µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.00   1264.6±7.34µs        ? ?/sec    1.00   1261.6±9.84µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.32      4.2±0.02ms        ? ?/sec    1.00      3.2±0.48ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.35      4.9±0.02ms        ? ?/sec    1.00      3.6±0.06ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.35      3.5±0.02ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.37      3.4±0.01ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec

hhhizzz · 2025-10-30T14:55:06Z

Use cargo bench --bench row_selection_state can test the relationship between the length of rowselection and time consuming.
On my M2 Mac Macbook:

when the length is 15, the perf of the bit mask and rowselector could be the same.

On codespace AMD EPYC 7763 64-Core Processor (4 core)

The Cross comes to around 30

alamb

First of all, thank you so much @hhhizzz -- I think this is really nice change and the code is well structured and a pleasure to read. Also thank you to @zhuqi-lucas for setting the stage for much of this work

Given the performance results so far (basically as good or better as the existing code) I think this PR is almost ready to go

The only thing I am not sure about is the null page / skipping thing -- I left more comments inline

I think there are several additional improvements that could be done as follow on work:

The heuristic for when to use the masking strategy can likely be improved based on the types of values being filtered (for example the number of columns or the inclusion of StringView)
Avoid creating RowSelection just to turn it back to a BooleanArray (I left comments inline)

alamb · 2025-10-30T17:35:12Z

parquet/benches/row_selection_state.rs

+        false,
+    )]));
+    let values = Int32Array::from_iter_values((0..total_rows).map(|v| v as i32));
+    let columns: Vec<ArrayRef> = vec![Arc::new(values) as ArrayRef];


I recommend we also test with some variable length rows too -- as the selection overhead may be different for StringArray/StringViewArray than a i32

alamb · 2025-10-30T17:38:21Z

parquet/benches/row_selection_state.rs

+    ("read_selectors", RowSelectionStrategy::Selectors),
+];
+
+fn criterion_benchmark(c: &mut Criterion) {


I found this code quite clear and easy to read -- thank you 🙏

I do think it would be good if we could add some of the background context here as a commenet

Specifically it is not obvious from just the code that this benchmark can be used to determine the value of AVG_SELECTOR_LEN_MASK_THRESHOLD) -- perhaps you can reuse some of the description from this PR?

Also, how did you generate these charts? If it is straightforward perhaps you can also describe that in the comments
#8733 (comment)

parquet/benches/row_selection_state.rs

parquet/src/arrow/arrow_reader/mod.rs

parquet/src/arrow/arrow_reader/read_plan.rs

alamb · 2025-10-30T17:57:10Z

parquet/src/arrow/arrow_reader/read_plan.rs

+
+        let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
+        let selector_count = selectors.len();
+        const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 16;


I recommend we pull this constant somewhere that is easier to find along with a comment about what it is and how it was chosen. I suggest simply making it a constant in this module

parquet/src/arrow/arrow_reader/read_plan.rs

alamb · 2025-10-30T18:00:35Z

parquet/src/arrow/arrow_reader/read_plan.rs

+            let mut cursor = start_position;
+            let mut initial_skip = 0;
+
+            while cursor < mask.len() && !mask.value(cursor) {


I suspect there are all sorts of bit level hacks we can do to make this faster (as a follow on PR) - for example leveraging the code to count the number of 1s a u64 at a time

alamb · 2025-10-30T18:08:14Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    }
+}
+
+fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer {


I think we can do even better than this (as a follow on PR)

The current code still converts the result of a filter (BooleanArray) to a RowSelection,
https://github.com/apache/arrow-rs/blob/cc1444a3232fa11b8485e2794a88f342bd7f97e2/parquet/src/arrow/arrow_reader/read_plan.rs#L113-L112

and then boolean_mask_from_selectors converts it back to a BooleanArray

However I think we could apply the result of evaluating the filter directly to a RowSelectionBacking::Mask

In fact, @XiangpengHao even has some (relatively crazy) techniques to combine masks quickly in #6624 (comment)

alamb · 2025-10-30T18:28:10Z

parquet/src/column/reader.rs

            let remaining_records = max_records - total_records_read;
            let remaining_levels = self.num_buffered_values - self.num_decoded_values;

+            if self.synthetic_page {


I don't understand the need for the synthetic page -- it seems like a workaround for some case that should be handled in the control flow loop in ParquetRecordBatchReader::next_inner

Specifically, given skipping / scanning data pages works with the RowSelection approach, why does a mask approach cause additional problems? In some way the mask approach should actually decode more rows, not less (as then the filter is applied afterwards)

Thanks for the thorough code review!
Yes—this is the trickiest part of the PR. When no pages are skipped, everything works as expected. But some pages can be skipped during row-group construction, use the Sparse ColumnChunkData, meaning their values and definition/repetition levels are never read. Row selection still works because skip_records() handles this case and skips the page accordingly.

However, with the Boolean-array design, all values must be read and decoded before filtering. ParquetRecordBatchReader is a streaming reader; it has no concept of pages, so we can’t rely on page size to drive skipping there. I think the most practical approach, therefore, is to return dummy null values as placeholders for the skipped pages. If I missed something or there's better way to do so, just let me know. 😊

A simple example:

the page size is 2, the mask is 100001, row selection should be read(1) skip(4) read(1)
the ColumnChunkData would be page1(10), page2(skipped), page3(01)
Using the rowselection to skip(4), the page2 won't be read at all.
But using the bit mask, we need all 6 value be read, but the page2 is not in the memory, which is why I need to construct this synthetic page.

For completeness, I prototyped reconstructing the readers to handle skipped pages directly, but it introduces a breaking change: every array_reader would need a page-size parameter. That’s undesirable—users shouldn’t need page-level details just to read Parquet.

I am similarly still confused.

@hhhizzz your explanation makes sense to me in theory, but I just tested out removing the synthetic page code from this PR and the tests all still seem to pass. So that means we either have a testing gap or there is something else going on:

Remove synthetic page from adaptive row selection hhhizzz/arrow-rs#5

I looked more carefully, and it seems to me that the calculation of what pages to fetch is still based on RowSelection (not the RowSelectionCursor / RowSelectionBacking):

arrow-rs/parquet/src/arrow/async_reader/mod.rs

Lines 983 to 995 in cc1444a

pub(crate) async fn fetch<T: AsyncFileReader + Send>(

&mut self,

input: &mut T,

projection: &ProjectionMask,

selection: Option<&RowSelection>,

batch_size: usize,

cache_mask: Option<&ProjectionMask>,

) -> Result<()> {

// Figure out what ranges to fetch

let FetchRanges {

ranges,

page_start_offsets,

} = self.fetch_ranges(projection, selection, batch_size, cache_mask);

Thus it does feel possible to have the the situation you explain where pages needed to evaluate the row selection weren't fetched 🤔

The error comes from my directing test on a parquet, let me added new tests for the scenario.

I have been thinking about how to test this scenario and have some ideas (it is probably time to do some fuzz testing / testing with very selective predicates). I hope to help write some additional tests later this week.

Thank you alamb, it looks like there's still something unresolved for the PR. I'm going to resolve it in the next few days. At mean time I may update or rebase the branch multiple times. So I converted the PR into draft.
The things left are:

Add benchmark for the different types of value to determine the final length to do the selection/bitmask converting

Add some guidance or tool to draw the charts, then we can collect more statistics data from different platform.

For the design of synthetic page, We all agree it's not a good idea, I need to find another method to handle the sparse page.

Add new tests to test if the bitmask method can handle all kinds of skipped page in sparse column chunk.

Thank you so much @hhhizzz -- this is super exciting and I will give top priority to reviewing this PR as you make changes.

parquet/src/arrow/arrow_reader/mod.rs

tustvold · 2025-11-01T11:29:33Z

parquet/src/column/reader.rs

+                            // Some writers omit data pages for sparse column chunks and encode the gap
+                            // as a reader-visible error. Use the metadata peek to synthesise a page of
+                            // null definition levels so downstream consumers see consistent row counts.
+                            self.try_create_synthetic_page(metadata)?;


This feels very fragile and likely to result in weird record shredding bugs - https://github.com/apache/arrow-rs/pull/8733/files#r2483674920

Additionally I think it would imply that the predicate pushdown is "reversing" earlier forms of pushdown and relying on the IO implementation to have chosen to do a sparse read - this feels unfortunate

tustvold · 2025-11-01T11:33:31Z

parquet/src/column/reader.rs

+        if self.descr.max_rep_level() != 0 {
+            return Err(general_err!(
+                "cannot synthesise sparse page for column with repetition levels ({message})"
+            ));
+        }
+
+        if self.descr.max_def_level() == 0 {
+            return Err(general_err!(
+                "cannot synthesise sparse page for required column ({message})"
+            ));
+        }


I think this would mean we error if we try to pushdown on a column with either repetition levels or a required column - this seems like quite a major regression

parquet/src/arrow/arrow_reader/mod.rs

tustvold

I took a quick look, whilst I think orchestrating this skipping at the RecordReader level does have a certain elegance, it runs into the issue that the masked selections aren't necessarily page-aligned.

By definition the mask selection strategy requests rows that weren't part of the original selection, the problem is that this could result in requesting rows for pages that we know are irrelevant. In some cases this just results in wasted IO, however, when using prefetching IO systems (such as AsyncParquetReader) this results in errors. The hack of creating empty pages I'm not a big fan of.

I think a better solution would be to ensure we only construct MaskChunk that don't cross page boundaries. Ideally this would be done on a per-leaf column basis, but tbh I suspect just doing it globally would probably work just fine.

Edit: If one was feeling fancy, one could ignore page boundaries where both pages were present in the original selection, although in practice I suspect this not to make a huge difference.

parquet/src/arrow/arrow_reader/mod.rs

github-actions bot added the parquet Changes to the parquet crate label Oct 28, 2025

This was referenced Oct 29, 2025

[Parquet] Adaptive Parquet Predicate Pushdown hhhizzz/arrow-rs#1

Closed

[Parquet] Adaptive Parquet Predicate Pushdown hhhizzz/arrow-rs#2

Closed

alamb mentioned this pull request Oct 29, 2025

WIP: Pin to Adaptive Parquet Predicate Pushdown apache/datafusion#18368

Draft

zhuqi-lucas reviewed Oct 30, 2025

View reviewed changes

alamb reviewed Oct 30, 2025

View reviewed changes

parquet/src/arrow/arrow_reader/mod.rs Outdated Show resolved Hide resolved

alamb mentioned this pull request Oct 31, 2025

[EPIC] Faster performance for parquet predicate evaluation for non selective filters #7456

Open

8 tasks

tustvold reviewed Nov 1, 2025

View reviewed changes

parquet/src/arrow/arrow_reader/mod.rs Show resolved Hide resolved

alamb mentioned this pull request Nov 1, 2025

Remove synthetic page from adaptive row selection hhhizzz/arrow-rs#5

Open

tustvold reviewed Nov 1, 2025

View reviewed changes

parquet/src/arrow/arrow_reader/mod.rs Show resolved Hide resolved

hhhizzz marked this pull request as draft November 3, 2025 09:53

Update using new method

ed51620

hhhizzz force-pushed the rowselectionempty branch from ad51d87 to ed51620 Compare November 4, 2025 12:22

fix clippy build

8742cd1

alamb mentioned this pull request Nov 4, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-03 apache/datafusion#18486

Open

28 tasks

	pub(crate) async fn fetch<T: AsyncFileReader + Send>(
	&mut self,
	input: &mut T,
	projection: &ProjectionMask,
	selection: Option<&RowSelection>,
	batch_size: usize,
	cache_mask: Option<&ProjectionMask>,
	) -> Result<()> {
	// Figure out what ranges to fetch
	let FetchRanges {
	ranges,
	page_start_offsets,
	} = self.fetch_ranges(projection, selection, batch_size, cache_mask);

[Parquet] Adaptive Parquet Predicate Pushdown #8733

Are you sure you want to change the base?

[Parquet] Adaptive Parquet Predicate Pushdown #8733

Uh oh!

Conversation

hhhizzz commented Oct 28, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

hhhizzz commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

hhhizzz commented Oct 30, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

A simple example:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

hhhizzz commented Oct 28, 2025 •

edited

Loading

tustvold left a comment •

edited

Loading