Skip to content

Conversation

@hhhizzz
Copy link
Contributor

@hhhizzz hhhizzz commented Oct 28, 2025

Which issue does this PR close?

Rationale for this change

This change improves the performance of reading Parquet files.

What changes are included in this PR?

This pull request introduces significant improvements to row selection and filtering in the Parquet Arrow reader, optimizing batch reading and handling of sparse data. The most important changes include a new mask-based row selection state, enhancements to synthetic page handling, and expanded test coverage for these features.

Row selection and filtering improvements:

  • Introduced RowSelectionState in read_plan.rs, which dynamically chooses between a bitmap mask array and selector queue for efficient row selection during batch reads. This enables streaming with contiguous mask segments and reduces overhead for sparse selections.
  • Updated ParquetRecordBatchReader to leverage the mask-based selection, streaming record batches using boolean masks and applying Arrow filtering for selected rows. This avoids intermediate materialization and improves performance for sparse row selections. If the average length of the RowSelector is less than 8, it will be replaced by a bitmap mask.
  • If the average RowSelector length is less than 8, it is automatically replaced by a bitmap mask.
  • Added a benchmark to determine this threshold value (8).

Synthetic page and definition level handling:

A challenge with the mask-based approach is that some pages may be skipped, and due to the streaming design of the reader, it’s not always possible to determine in advance which pages will be skipped.
To address this, additional logic was added to return None when a page is skipped, ensuring correct handling in such cases.
Together, these improvements enhance both efficiency and correctness in row selection, filtering, and sparse data processing for the Parquet Arrow reader.

Are these changes tested?

  • Added new tests for interleaved skip/select row selections and mask-based sparse row selection, ensuring correctness of the new mask-based streaming logic and synthetic page handling.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 28, 2025
@hhhizzz
Copy link
Contributor Author

hhhizzz commented Oct 28, 2025

Cargo bench result, added emoji for better looking, 🟢 means not worse, 👍🏻 means 20% more improve.
I noticed in some arrow_reader_row_filter, the perf downgrade 5%, continue investigating.

group main rowselectionempty
arrow_reader_clickbench/async/Q1 1.00 🟢 1472.6±2.89µs ? ?/sec 1.00 🟢 1475.1±3.83µs ? ?/sec
arrow_reader_clickbench/async/Q10 1.02 8.6±0.05ms ? ?/sec 1.00 🟢 8.5±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q11 1.01 9.9±0.10ms ? ?/sec 1.00 🟢 9.8±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q12 1.06 18.6±0.08ms ? ?/sec 1.00 🟢 17.4±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q13 1.30 27.2±0.31ms ? ?/sec 1.00 🟢👍🏻 20.9±0.12ms ? ?/sec
arrow_reader_clickbench/async/Q14 1.38 26.2±0.05ms ? ?/sec 1.00 🟢👍🏻 19.0±0.03ms ? ?/sec
arrow_reader_clickbench/async/Q19 1.01 3.8±0.01ms ? ?/sec 1.00 🟢 3.8±0.05ms ? ?/sec
arrow_reader_clickbench/async/Q20 1.26 96.6±15.76ms ? ?/sec 1.00 🟢👍🏻 76.9±0.38ms ? ?/sec
arrow_reader_clickbench/async/Q21 1.31 117.5±19.40ms ? ?/sec 1.00 🟢👍🏻 89.5±0.10ms ? ?/sec
arrow_reader_clickbench/async/Q22 1.00 🟢 160.6±2.96ms ? ?/sec 1.02 164.3±2.78ms ? ?/sec
arrow_reader_clickbench/async/Q23 1.00 🟢 290.1±3.98ms ? ?/sec 1.00 🟢 290.7±3.78ms ? ?/sec
arrow_reader_clickbench/async/Q24 1.31 29.8±0.07ms ? ?/sec 1.00 🟢👍🏻 22.8±0.09ms ? ?/sec
arrow_reader_clickbench/async/Q27 1.00 🟢 65.2±0.26ms ? ?/sec 1.01 65.9±0.58ms ? ?/sec
arrow_reader_clickbench/async/Q28 1.01 67.2±0.31ms ? ?/sec 1.00 🟢 66.5±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q30 1.83 40.5±0.13ms ? ?/sec 1.00 🟢👍🏻 22.2±0.04ms ? ?/sec
arrow_reader_clickbench/async/Q36 1.08 81.9±1.02ms ? ?/sec 1.00 🟢 75.9±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q37 1.10 65.5±0.07ms ? ?/sec 1.00 🟢 59.4±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q38 1.00 🟢 25.5±0.05ms ? ?/sec 1.03 26.2±0.06ms ? ?/sec
arrow_reader_clickbench/async/Q39 1.00 🟢 32.0±0.03ms ? ?/sec 1.00 🟢 32.1±0.11ms ? ?/sec
arrow_reader_clickbench/async/Q40 1.66 33.7±0.14ms ? ?/sec 1.00 🟢👍🏻 20.3±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q41 1.53 25.7±0.05ms ? ?/sec 1.00 🟢👍🏻 16.8±0.08ms ? ?/sec
arrow_reader_clickbench/async/Q42 1.15 9.8±0.04ms ? ?/sec 1.00 🟢 8.5±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q1 1.00 🟢 1381.6±15.65µs ? ?/sec 1.02 1410.1±12.14µs ? ?/sec
arrow_reader_clickbench/sync/Q10 1.03 6.9±0.05ms ? ?/sec 1.00 🟢 6.7±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q11 1.00 🟢 8.2±0.05ms ? ?/sec 1.00 🟢 8.2±0.04ms ? ?/sec
arrow_reader_clickbench/sync/Q12 1.06 29.9±0.04ms ? ?/sec 1.00 🟢 28.2±0.67ms ? ?/sec
arrow_reader_clickbench/sync/Q13 1.09 38.6±0.06ms ? ?/sec 1.00 🟢 35.6±1.45ms ? ?/sec
arrow_reader_clickbench/sync/Q14 1.13 37.4±0.05ms ? ?/sec 1.00 🟢 33.0±0.05ms ? ?/sec
arrow_reader_clickbench/sync/Q19 1.00 🟢 3.2±0.01ms ? ?/sec 1.00 🟢 3.2±0.01ms ? ?/sec
arrow_reader_clickbench/sync/Q20 1.04 129.3±0.26ms ? ?/sec 1.00 🟢 124.0±0.20ms ? ?/sec
arrow_reader_clickbench/sync/Q21 1.05 177.1±0.30ms ? ?/sec 1.00 🟢 169.0±1.08ms ? ?/sec
arrow_reader_clickbench/sync/Q22 1.05 360.6±9.48ms ? ?/sec 1.00 🟢 342.6±2.92ms ? ?/sec
arrow_reader_clickbench/sync/Q23 1.04 314.8±15.28ms ? ?/sec 1.00 🟢 303.1±12.31ms ? ?/sec
arrow_reader_clickbench/sync/Q24 1.17 35.6±0.17ms ? ?/sec 1.00 🟢 30.4±0.06ms ? ?/sec
arrow_reader_clickbench/sync/Q27 1.00 🟢 100.3±0.20ms ? ?/sec 1.00 🟢 100.6±0.12ms ? ?/sec
arrow_reader_clickbench/sync/Q28 1.00 🟢 101.0±0.15ms ? ?/sec 1.00 🟢 101.0±0.52ms ? ?/sec
arrow_reader_clickbench/sync/Q30 1.86 38.8±0.07ms ? ?/sec 1.00 🟢👍🏻 20.9±0.05ms ? ?/sec
arrow_reader_clickbench/sync/Q36 1.06 103.2±0.12ms ? ?/sec 1.00 🟢 97.1±0.14ms ? ?/sec
arrow_reader_clickbench/sync/Q37 1.09 60.7±0.15ms ? ?/sec 1.00 🟢 55.7±0.12ms ? ?/sec
arrow_reader_clickbench/sync/Q38 1.00 🟢 20.4±0.03ms ? ?/sec 1.03 21.1±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q39 1.00 🟢 22.7±0.03ms ? ?/sec 1.00 🟢 22.7±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q40 1.74 32.1±0.05ms ? ?/sec 1.00 🟢👍🏻 18.4±0.03ms ? ?/sec
arrow_reader_clickbench/sync/Q41 1.59 23.9±0.04ms ? ?/sec 1.00 🟢👍🏻 15.0±0.02ms ? ?/sec
arrow_reader_clickbench/sync/Q42 1.16 9.2±0.03ms ? ?/sec 1.00 🟢 7.9±0.02ms ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/async 1.00 🟢 1157.2±2.57µs ? ?/sec 1.03 1188.1±17.66µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync 1.00 🟢 1260.0±3.78µs ? ?/sec 1.08 1359.2±1.95µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async 1.00 🟢 1062.5±4.63µs ? ?/sec 1.08 1148.4±2.43µs ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync 1.00 🟢 1090.9±1.82µs ? ?/sec 1.07 1172.1±1.76µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async 1.00 🟢 978.4±1.26µs ? ?/sec 1.05 1023.3±1.68µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync 1.00 🟢 1150.7±2.87µs ? ?/sec 1.05 1204.9±2.07µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async 1.00 🟢 904.2±2.18µs ? ?/sec 1.05 946.6±0.78µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync 1.00 🟢 955.7±0.64µs ? ?/sec 1.05 1001.9±2.72µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async 1.00 🟢 1160.4±1.68µs ? ?/sec 1.06 1234.1±1.85µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync 1.00 🟢 1263.1±2.77µs ? ?/sec 1.07 1348.1±1.45µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async 1.00 🟢 1065.7±1.90µs ? ?/sec 1.06 1134.2±1.56µs ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync 1.00 🟢 1092.5±1.50µs ? ?/sec 1.07 1168.0±1.42µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async 1.00 🟢 592.4±1.43µs ? ?/sec 1.00 🟢 592.9±1.69µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync 1.01 626.4±0.84µs ? ?/sec 1.00 🟢 622.9±0.70µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async 1.01 573.7±0.49µs ? ?/sec 1.00 🟢 570.1±0.65µs ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync 1.00 🟢 590.3±1.04µs ? ?/sec 1.05 617.4±0.66µs ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async 1.05 3.2±0.01ms ? ?/sec 1.00 🟢 3.0±0.00ms ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync 1.08 3.1±0.01ms ? ?/sec 1.00 🟢 2.9±0.00ms ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async 1.62 2.7±0.01ms ? ?/sec 1.00 🟢👍🏻 1684.8±2.62µs ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync 1.66 2.6±0.01ms ? ?/sec 1.00 🟢👍🏻 1557.3±4.54µs ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async 1.00 🟢 1298.3±4.07µs ? ?/sec 1.06 1373.2±3.09µs ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync 1.00 🟢 1472.5±6.34µs ? ?/sec 1.06 1567.0±2.61µs ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async 1.00 🟢 1217.0±3.13µs ? ?/sec 1.06 1294.2±3.22µs ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync 1.00 🟢 1276.0±10.65µs ? ?/sec 1.03 1318.2±37.69µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async 1.00 🟢 791.7±1.24µs ? ?/sec 1.06 836.0±0.99µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync 1.00 🟢 893.1±1.48µs ? ?/sec 1.06 946.5±0.86µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async 1.00 🟢 744.6±0.88µs ? ?/sec 1.05 782.7±0.67µs ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync 1.00 🟢 794.7±3.42µs ? ?/sec 1.06 839.4±1.29µs ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async 1.54 3.1±0.01ms ? ?/sec 1.00 🟢👍🏻 2.0±0.00ms ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync 1.51 3.7±0.01ms ? ?/sec 1.00 🟢👍🏻 2.4±0.00ms ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async 1.45 2.7±0.01ms ? ?/sec 1.00 🟢👍🏻 1834.8±3.39µs ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync 1.41 2.6±0.02ms ? ?/sec 1.00 🟢👍🏻 1825.7±3.46µs ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

😮 thank you @hhhizzz -- I plan to review this PR carefully, but it will likely take me a few days

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

fyi @zhuqi-lucas and @XiangpengHao

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

fn new(selectors: Vec<RowSelector>) -> Self {
let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
let selector_count = selectors.len();
const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb It looks like similar to my original implementation which is fixed for choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's more reasonable for this PR:

Added a benchmark to determine this threshold value (8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added bench in the code, you can also try on your machine. I find it varies heavily on different platform, on my Mac, it's 8, but on my x86 PC, the value can be set to around 30.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @hhhizzz , i am wandering if we can change to more stable choice, such as statistic based choice, but it's a good start for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great idea, let me do some more investigation on different platform and put the result here.

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                                                                                                      main                                   rowselectionempty
-----                                                                                                      ----                                   -----------------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1099.8±2.52µs        ? ?/sec    1.16   1274.8±3.26µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1269.3±2.75µs        ? ?/sec    1.03   1307.9±4.94µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1107.4±3.98µs        ? ?/sec    1.16   1281.4±2.80µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.05    513.1±6.17µs        ? ?/sec    1.00    486.6±3.42µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.00    660.4±3.25µs        ? ?/sec    1.02    673.5±6.14µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.03    506.2±2.98µs        ? ?/sec    1.00    493.5±3.45µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.04    572.5±2.07µs        ? ?/sec    1.00    552.8±2.47µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    725.3±4.38µs        ? ?/sec    1.01    731.3±3.60µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.03    585.1±3.07µs        ? ?/sec    1.00    565.3±3.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    238.9±3.00µs        ? ?/sec    1.14    271.5±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    214.6±0.58µs        ? ?/sec    1.24    266.2±1.12µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    235.8±2.71µs        ? ?/sec    1.18    278.6±3.64µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.21    354.3±2.88µs        ? ?/sec    1.00    292.4±4.95µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.16    326.1±0.58µs        ? ?/sec    1.00    282.3±1.53µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.04    291.0±2.36µs        ? ?/sec    1.00    279.3±1.39µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.20    359.2±5.50µs        ? ?/sec    1.00    299.9±2.58µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    980.1±3.50µs        ? ?/sec    1.14   1122.1±9.70µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    847.1±2.21µs        ? ?/sec    1.14    966.6±2.17µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    985.8±2.30µs        ? ?/sec    1.15   1133.0±3.28µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    306.5±4.21µs        ? ?/sec    1.46    446.5±4.50µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    478.8±1.42µs        ? ?/sec    1.32    633.6±7.02µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    311.6±5.15µs        ? ?/sec    1.46    456.1±3.48µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    161.2±0.81µs        ? ?/sec    1.26    202.8±0.55µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.00    303.1±0.74µs        ? ?/sec    1.13    343.6±0.47µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    166.7±0.32µs        ? ?/sec    1.25    208.1±0.38µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.7±0.31µs        ? ?/sec    1.52    118.2±0.36µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.00    260.3±0.57µs        ? ?/sec    1.16    300.7±0.87µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     84.1±0.30µs        ? ?/sec    1.46    123.2±0.17µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    738.5±1.43µs        ? ?/sec    1.00    737.3±2.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    580.0±2.20µs        ? ?/sec    1.02    591.8±2.08µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    743.9±2.69µs        ? ?/sec    1.00    743.2±2.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     64.7±4.61µs        ? ?/sec    1.01     65.5±5.32µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    245.5±1.68µs        ? ?/sec    1.03    252.8±1.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.00     73.1±6.90µs        ? ?/sec    1.03     75.4±1.73µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     94.5±0.22µs        ? ?/sec    1.00     94.6±0.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    235.9±0.94µs        ? ?/sec    1.00    235.0±1.30µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     99.7±0.48µs        ? ?/sec    1.01    100.2±1.65µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.05      9.8±0.13µs        ? ?/sec    1.00      9.3±0.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.00    193.5±0.31µs        ? ?/sec    1.00    192.7±0.72µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.04     15.1±0.28µs        ? ?/sec    1.00     14.4±0.14µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    184.5±0.56µs        ? ?/sec    1.00    185.1±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    346.8±0.82µs        ? ?/sec    1.00    348.1±2.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    190.0±0.52µs        ? ?/sec    1.00    190.9±0.99µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.02     14.7±0.32µs        ? ?/sec    1.00     14.4±0.31µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.00    262.2±0.84µs        ? ?/sec    1.00    262.6±1.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.1±0.58µs        ? ?/sec    1.02     20.5±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    364.7±1.81µs        ? ?/sec    1.01    367.8±1.49µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    383.5±1.16µs        ? ?/sec    1.01    388.4±1.63µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    372.1±0.72µs        ? ?/sec    1.01    374.9±1.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     26.4±0.30µs        ? ?/sec    1.05     27.7±0.53µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.00    215.4±0.94µs        ? ?/sec    1.01    218.2±1.01µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     30.6±0.46µs        ? ?/sec    1.16     35.5±0.52µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    124.8±0.51µs        ? ?/sec    1.00    124.3±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.12    140.0±0.78µs        ? ?/sec    1.00    125.1±1.22µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    127.4±0.77µs        ? ?/sec    1.00    127.4±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    178.7±0.42µs        ? ?/sec    1.00    178.3±1.56µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.14    236.1±1.84µs        ? ?/sec    1.00    207.6±1.98µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    183.6±0.37µs        ? ?/sec    1.00    183.9±1.78µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01     76.3±0.21µs        ? ?/sec    1.00     75.5±0.25µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.15    181.2±0.83µs        ? ?/sec    1.00    157.3±2.15µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     83.0±0.37µs        ? ?/sec    1.00     82.1±0.18µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    135.3±0.37µs        ? ?/sec    1.06    143.3±2.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.11    214.0±0.96µs        ? ?/sec    1.00    192.2±2.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    141.2±0.32µs        ? ?/sec    1.05    148.7±0.34µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.7±0.33µs        ? ?/sec    1.00     74.6±0.31µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.16    177.7±0.61µs        ? ?/sec    1.00    153.6±0.51µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.01     78.6±0.27µs        ? ?/sec    1.00     77.9±0.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    111.9±0.23µs        ? ?/sec    1.02    114.5±0.26µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    124.4±0.42µs        ? ?/sec    1.07    133.7±0.60µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.02    117.7±2.19µs        ? ?/sec    1.00    115.7±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    169.4±0.63µs        ? ?/sec    1.01    170.7±0.31µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    211.4±1.55µs        ? ?/sec    1.13    239.2±0.76µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    175.8±1.56µs        ? ?/sec    1.00    175.9±0.33µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    202.0±0.31µs        ? ?/sec    1.00    201.2±0.43µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    225.6±0.64µs        ? ?/sec    1.12    252.9±3.10µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.6±1.41µs        ? ?/sec    1.00    208.3±0.65µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    144.6±0.53µs        ? ?/sec    1.07    154.4±4.00µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    193.6±1.10µs        ? ?/sec    1.15    221.9±0.54µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    148.7±0.40µs        ? ?/sec    1.05    155.5±1.67µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00    107.7±0.78µs        ? ?/sec    1.01    108.3±2.06µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    172.1±1.13µs        ? ?/sec    1.16    199.4±1.08µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    115.7±1.55µs        ? ?/sec    1.00    115.4±1.66µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.01    102.1±0.21µs        ? ?/sec    1.00    101.1±0.22µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.14    117.8±0.30µs        ? ?/sec    1.00    103.5±0.31µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.01    105.6±0.24µs        ? ?/sec    1.00    104.2±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.01    139.5±0.29µs        ? ?/sec    1.00    137.4±1.31µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.15    194.3±0.71µs        ? ?/sec    1.00    168.5±2.16µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.02    144.8±0.33µs        ? ?/sec    1.00    142.0±1.28µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     42.5±0.26µs        ? ?/sec    1.04     44.2±0.32µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.21    143.2±0.50µs        ? ?/sec    1.00    118.0±0.46µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     47.7±0.10µs        ? ?/sec    1.03     49.1±0.21µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.00    102.7±0.31µs        ? ?/sec    1.07    110.3±0.30µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.14    177.3±0.34µs        ? ?/sec    1.00    155.5±0.82µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    108.2±0.24µs        ? ?/sec    1.07    115.8±0.87µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.01     38.5±0.12µs        ? ?/sec    1.00     38.2±0.14µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.22    142.1±0.35µs        ? ?/sec    1.00    116.0±0.41µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     44.0±0.18µs        ? ?/sec    1.00     43.7±0.13µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.01     98.5±0.30µs        ? ?/sec    1.00     97.3±0.17µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.15    111.4±0.31µs        ? ?/sec    1.00     96.6±0.70µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.01    101.3±0.19µs        ? ?/sec    1.00    100.7±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.02    128.6±0.19µs        ? ?/sec    1.00    126.3±0.42µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.16    175.9±1.18µs        ? ?/sec    1.00    151.9±0.43µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    130.9±0.42µs        ? ?/sec    1.00    131.3±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.8±0.31µs        ? ?/sec    1.00     24.5±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.25    127.1±0.73µs        ? ?/sec    1.00    101.7±0.59µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.04     31.1±0.27µs        ? ?/sec    1.00     30.0±0.44µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     83.6±0.33µs        ? ?/sec    1.09     91.2±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.13    156.0±0.29µs        ? ?/sec    1.00    137.5±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     89.1±0.20µs        ? ?/sec    1.09     97.0±0.65µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.02     18.0±0.43µs        ? ?/sec    1.00     17.7±0.51µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.27    122.7±0.52µs        ? ?/sec    1.00     96.2±0.21µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     25.9±0.45µs        ? ?/sec    1.00     25.6±0.87µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     85.7±0.23µs        ? ?/sec    1.00     86.0±0.27µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00     93.8±0.90µs        ? ?/sec    1.13    106.0±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     88.8±0.86µs        ? ?/sec    1.00     89.1±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    117.5±1.43µs        ? ?/sec    1.00    117.2±0.44µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    149.7±1.76µs        ? ?/sec    1.17    174.8±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.02    122.7±0.63µs        ? ?/sec    1.00    120.5±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    149.5±0.77µs        ? ?/sec    1.00    148.5±0.50µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.00    170.3±0.59µs        ? ?/sec    1.15    196.4±0.60µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    154.6±0.44µs        ? ?/sec    1.00    154.5±0.89µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     90.7±0.39µs        ? ?/sec    1.09     98.7±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.00    138.0±1.48µs        ? ?/sec    1.21    167.0±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     97.0±0.69µs        ? ?/sec    1.07    104.1±0.48µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     43.0±0.70µs        ? ?/sec    1.05     45.2±2.20µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    112.4±1.30µs        ? ?/sec    1.22    137.3±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     49.1±0.66µs        ? ?/sec    1.05     51.5±2.74µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.02     98.9±0.24µs        ? ?/sec    1.00     96.9±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.15    114.0±0.17µs        ? ?/sec    1.00     99.3±0.18µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.02    102.1±0.19µs        ? ?/sec    1.00     99.9±0.40µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.01    130.5±0.72µs        ? ?/sec    1.00    128.9±0.36µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.16    184.9±0.64µs        ? ?/sec    1.00    159.8±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    135.1±1.03µs        ? ?/sec    1.00    134.8±0.57µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.06     36.6±0.13µs        ? ?/sec    1.00     34.3±0.08µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.21    137.3±0.65µs        ? ?/sec    1.00    113.1±0.30µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     41.6±0.11µs        ? ?/sec    1.00     41.6±0.10µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     95.3±0.29µs        ? ?/sec    1.07    102.4±0.89µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.14    170.2±0.84µs        ? ?/sec    1.00    148.7±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00    101.1±0.28µs        ? ?/sec    1.07    107.8±0.28µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.5±0.13µs        ? ?/sec    1.01     30.7±0.17µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.23    134.3±0.75µs        ? ?/sec    1.00    109.6±0.50µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     36.1±0.13µs        ? ?/sec    1.00     36.1±0.18µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.0±0.02ms        ? ?/sec    1.02      7.1±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     12.8±0.09ms        ? ?/sec    1.04     13.3±0.18ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.05    506.2±3.03µs        ? ?/sec    1.00    483.7±4.49µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    657.7±2.06µs        ? ?/sec    1.03    676.4±6.01µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.01    505.5±4.10µs        ? ?/sec    1.00    498.2±2.68µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.07    735.7±5.14µs        ? ?/sec    1.00    686.6±3.75µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.02    800.6±3.27µs        ? ?/sec    1.00    784.2±3.19µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.06    741.7±4.61µs        ? ?/sec    1.00    699.1±3.62µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.02    301.0±1.47µs        ? ?/sec    1.00    296.4±1.47µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.07    385.6±6.15µs        ? ?/sec    1.00    358.7±5.47µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.01    306.8±4.70µs        ? ?/sec    1.00    302.4±1.34µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    229.3±4.89µs        ? ?/sec    1.20    274.1±2.95µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    214.2±0.62µs        ? ?/sec    1.24    266.2±1.71µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.3±2.41µs        ? ?/sec    1.20    279.1±2.72µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.00    462.7±6.19µs        ? ?/sec    1.04    479.3±1.83µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.00    337.0±1.23µs        ? ?/sec    1.10    371.2±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.00    470.2±2.72µs        ? ?/sec    1.04    488.4±1.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    109.6±2.85µs        ? ?/sec    1.07    116.8±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.09    121.1±0.33µs        ? ?/sec    1.00    111.2±0.32µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    112.6±0.38µs        ? ?/sec    1.06    119.8±0.53µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    143.9±0.74µs        ? ?/sec    1.09    156.3±0.51µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.12    198.0±2.59µs        ? ?/sec    1.00    177.4±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    148.1±0.41µs        ? ?/sec    1.09    161.7±0.49µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     42.5±0.39µs        ? ?/sec    1.04     44.1±0.20µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.23    146.9±2.60µs        ? ?/sec    1.00    119.2±0.27µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     47.6±0.14µs        ? ?/sec    1.03     49.0±0.13µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    103.0±0.21µs        ? ?/sec    1.07    110.1±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.14    176.9±0.47µs        ? ?/sec    1.00    154.6±0.30µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    107.9±0.35µs        ? ?/sec    1.07    115.7±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.01     38.6±0.17µs        ? ?/sec    1.00     38.2±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.22    141.8±0.45µs        ? ?/sec    1.00    116.3±0.36µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.8±0.12µs        ? ?/sec    1.00     43.8±0.14µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.02    100.9±0.22µs        ? ?/sec    1.00     98.5±0.32µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.15    112.6±0.34µs        ? ?/sec    1.00     97.5±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.03    104.4±0.26µs        ? ?/sec    1.00    101.4±0.34µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.01    128.7±0.36µs        ? ?/sec    1.00    127.4±0.30µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.19    181.8±0.85µs        ? ?/sec    1.00    152.7±1.58µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.01    133.8±0.46µs        ? ?/sec    1.00    132.7±1.95µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.08     27.0±0.40µs        ? ?/sec    1.00     25.0±0.49µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.27    125.3±0.33µs        ? ?/sec    1.00     98.6±0.30µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.06     31.8±0.35µs        ? ?/sec    1.00     30.1±0.40µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.00     86.3±0.39µs        ? ?/sec    1.07     92.7±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.16    159.3±0.86µs        ? ?/sec    1.00    136.9±0.42µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.00     91.0±0.38µs        ? ?/sec    1.08     98.2±0.35µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.00     21.0±0.58µs        ? ?/sec    1.01     21.2±0.69µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.27    123.4±0.41µs        ? ?/sec    1.00     96.9±0.44µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     26.8±0.83µs        ? ?/sec    1.02     27.2±0.92µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.00     85.8±0.28µs        ? ?/sec    1.00     86.1±0.25µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00     93.8±0.22µs        ? ?/sec    1.13    105.9±0.35µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     89.0±0.32µs        ? ?/sec    1.00     88.7±0.27µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    116.6±0.51µs        ? ?/sec    1.02    118.9±0.48µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    158.1±0.82µs        ? ?/sec    1.15    181.9±0.65µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    122.8±0.53µs        ? ?/sec    1.00    123.2±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.01    149.6±0.69µs        ? ?/sec    1.00    148.1±1.10µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.00    169.8±1.23µs        ? ?/sec    1.15    195.9±1.84µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.01    155.7±0.46µs        ? ?/sec    1.00    153.8±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     91.4±0.37µs        ? ?/sec    1.07     98.0±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.00    138.7±1.24µs        ? ?/sec    1.20    166.6±0.54µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     96.7±0.42µs        ? ?/sec    1.07    103.8±0.87µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.00     41.6±0.89µs        ? ?/sec    1.11     46.4±1.77µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.00    113.8±0.33µs        ? ?/sec    1.22    138.6±0.71µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.00     48.4±0.62µs        ? ?/sec    1.12     54.4±2.27µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.04    106.1±0.91µs        ? ?/sec    1.00    102.3±0.23µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.15    118.0±1.14µs        ? ?/sec    1.00    102.2±0.32µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.04    109.4±0.92µs        ? ?/sec    1.00    105.3±0.27µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.01    137.6±1.41µs        ? ?/sec    1.00    136.7±0.36µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.16    189.3±1.94µs        ? ?/sec    1.00    163.6±0.32µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.00    142.7±1.37µs        ? ?/sec    1.00    142.2±1.46µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     36.5±0.31µs        ? ?/sec    1.00     36.3±0.11µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.22    135.2±0.37µs        ? ?/sec    1.00    111.1±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     41.3±0.32µs        ? ?/sec    1.00     41.3±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     95.2±0.83µs        ? ?/sec    1.07    101.8±0.25µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.15    169.5±0.50µs        ? ?/sec    1.00    147.0±0.40µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00    100.7±0.21µs        ? ?/sec    1.07    107.8±0.47µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.02     30.8±0.10µs        ? ?/sec    1.00     30.2±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.23    134.4±0.41µs        ? ?/sec    1.00    109.0±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     36.2±0.12µs        ? ?/sec    1.00     36.3±0.16µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                                main                                   rowselectionempty
-----                                ----                                   -----------------
arrow_reader_clickbench/async/Q1     1.00      2.3±0.01ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.2±0.46ms        ? ?/sec    1.02     13.5±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.8±0.28ms        ? ?/sec    1.04     15.4±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     28.0±0.25ms        ? ?/sec    1.00     27.4±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.19     39.1±0.24ms        ? ?/sec    1.00     32.8±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.22     37.0±0.30ms        ? ?/sec    1.00     30.3±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.6±0.10ms        ? ?/sec    1.00      5.6±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00   133.9±14.45ms        ? ?/sec    1.24   165.7±11.71ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00   153.7±15.94ms        ? ?/sec    1.22   187.2±19.78ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00   276.6±19.55ms        ? ?/sec    1.18   327.1±32.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    433.1±8.95ms        ? ?/sec    1.00    433.2±2.25ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.19     44.0±0.51ms        ? ?/sec    1.00     36.9±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    105.0±0.78ms        ? ?/sec    1.02    107.0±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    105.2±0.54ms        ? ?/sec    1.03    108.1±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.63     53.8±0.43ms        ? ?/sec    1.00     33.1±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.02    125.1±0.45ms        ? ?/sec    1.00    122.3±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.04     98.8±0.57ms        ? ?/sec    1.00     95.3±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     37.2±0.28ms        ? ?/sec    1.05     39.0±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     48.2±0.33ms        ? ?/sec    1.05     50.4±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.45     45.7±1.20ms        ? ?/sec    1.00     31.5±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.39     36.4±0.68ms        ? ?/sec    1.00     26.1±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.12     13.7±0.17ms        ? ?/sec    1.00     12.3±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.01      9.5±0.10ms        ? ?/sec    1.00      9.4±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.0±0.12ms        ? ?/sec    1.01     11.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.06     38.8±0.27ms        ? ?/sec    1.00     36.7±2.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.02     49.9±0.29ms        ? ?/sec    1.00     48.7±0.47ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.08     47.8±0.27ms        ? ?/sec    1.00     44.4±2.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.01      4.3±0.03ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    178.4±0.91ms        ? ?/sec    1.01    181.0±0.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.01    242.7±1.06ms        ? ?/sec    1.00    239.6±1.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    484.3±3.98ms        ? ?/sec    1.01    489.9±3.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   440.8±15.88ms        ? ?/sec    1.00   441.8±14.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.10     51.6±0.80ms        ? ?/sec    1.00     46.8±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    155.5±0.99ms        ? ?/sec    1.03    159.6±0.98ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    151.5±0.72ms        ? ?/sec    1.03    156.2±1.49ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.66     52.1±0.36ms        ? ?/sec    1.00     31.4±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    155.1±1.31ms        ? ?/sec    1.00    154.8±1.54ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.06     90.0±0.37ms        ? ?/sec    1.00     85.1±0.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     30.0±0.17ms        ? ?/sec    1.01     30.3±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.7±0.35ms        ? ?/sec    1.02     35.2±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.64     44.1±0.40ms        ? ?/sec    1.00     26.9±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.47     33.3±0.34ms        ? ?/sec    1.00     22.6±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.15     12.8±0.14ms        ? ?/sec    1.00     11.1±0.11ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing rowselectionempty (14647e1) to 5744743 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=rowselectionempty
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                                                                                main                                   rowselectionempty
-----                                                                                ----                                   -----------------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1720.5±11.69µs        ? ?/sec    1.01  1739.0±13.11µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.00      2.0±0.02ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00   1557.1±7.68µs        ? ?/sec    1.02   1586.2±7.77µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.00   1656.7±8.38µs        ? ?/sec    1.01  1672.2±14.29µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00   1519.3±8.50µs        ? ?/sec    1.00  1524.9±11.77µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.00  1870.1±14.00µs        ? ?/sec    1.00  1860.9±12.07µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00   1350.6±4.84µs        ? ?/sec    1.00  1356.5±10.51µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.00   1450.9±6.97µs        ? ?/sec    1.01  1467.6±12.90µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00   1693.8±7.84µs        ? ?/sec    1.03  1745.4±12.91µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.00  1976.9±14.72µs        ? ?/sec    1.01      2.0±0.02ms        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00   1550.6±5.40µs        ? ?/sec    1.02  1583.6±10.16µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.00  1634.7±18.11µs        ? ?/sec    1.02  1663.9±11.89µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00    935.4±4.28µs        ? ?/sec    1.01    945.1±5.97µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.00    988.9±5.01µs        ? ?/sec    1.00    993.7±9.68µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00    865.3±3.69µs        ? ?/sec    1.01    870.3±9.16µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.01    984.7±4.82µs        ? ?/sec    1.00    978.6±6.66µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.47      4.1±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.57      4.1±0.01ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.34      3.6±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.43      3.5±0.01ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.00   1917.4±8.96µs        ? ?/sec    1.01  1945.3±16.00µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00   1752.7±8.89µs        ? ?/sec    1.00  1756.5±10.41µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.00   1885.0±8.43µs        ? ?/sec    1.00  1888.4±14.49µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.01   1254.9±4.74µs        ? ?/sec    1.00  1243.6±11.57µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.00   1383.5±5.02µs        ? ?/sec    1.00  1387.9±10.40µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1134.1±10.68µs        ? ?/sec    1.00   1134.0±7.09µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.00   1264.6±7.34µs        ? ?/sec    1.00   1261.6±9.84µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.32      4.2±0.02ms        ? ?/sec    1.00      3.2±0.48ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.35      4.9±0.02ms        ? ?/sec    1.00      3.6±0.06ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.35      3.5±0.02ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.37      3.4±0.01ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec

@hhhizzz
Copy link
Contributor Author

hhhizzz commented Oct 30, 2025

Use cargo bench --bench row_selection_state can test the relationship between the length of rowselection and time consuming.
On my M2 Mac Macbook:
output-2
when the length is 15, the perf of the bit mask and rowselector could be the same.

On codespace AMD EPYC 7763 64-Core Processor (4 core)
output
The Cross comes to around 30

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you so much @hhhizzz -- I think this is really nice change and the code is well structured and a pleasure to read. Also thank you to @zhuqi-lucas for setting the stage for much of this work

Given the performance results so far (basically as good or better as the existing code) I think this PR is almost ready to go

The only thing I am not sure about is the null page / skipping thing -- I left more comments inline

I think there are several additional improvements that could be done as follow on work:

  1. The heuristic for when to use the masking strategy can likely be improved based on the types of values being filtered (for example the number of columns or the inclusion of StringView)
  2. Avoid creating RowSelection just to turn it back to a BooleanArray (I left comments inline)

false,
)]));
let values = Int32Array::from_iter_values((0..total_rows).map(|v| v as i32));
let columns: Vec<ArrayRef> = vec![Arc::new(values) as ArrayRef];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we also test with some variable length rows too -- as the selection overhead may be different for StringArray/StringViewArray than a i32

("read_selectors", RowSelectionStrategy::Selectors),
];

fn criterion_benchmark(c: &mut Criterion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this code quite clear and easy to read -- thank you 🙏

I do think it would be good if we could add some of the background context here as a commenet

Specifically it is not obvious from just the code that this benchmark can be used to determine the value of AVG_SELECTOR_LEN_MASK_THRESHOLD) -- perhaps you can reuse some of the description from this PR?

Also, how did you generate these charts? If it is straightforward perhaps you can also describe that in the comments
#8733 (comment)


let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
let selector_count = selectors.len();
const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we pull this constant somewhere that is easier to find along with a comment about what it is and how it was chosen. I suggest simply making it a constant in this module

let mut cursor = start_position;
let mut initial_skip = 0;

while cursor < mask.len() && !mask.value(cursor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there are all sorts of bit level hacks we can do to make this faster (as a follow on PR) - for example leveraging the code to count the number of 1s a u64 at a time

}
}

fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do even better than this (as a follow on PR)

The current code still converts the result of a filter (BooleanArray) to a RowSelection,
https://github.com/apache/arrow-rs/blob/cc1444a3232fa11b8485e2794a88f342bd7f97e2/parquet/src/arrow/arrow_reader/read_plan.rs#L113-L112

and then boolean_mask_from_selectors converts it back to a BooleanArray

However I think we could apply the result of evaluating the filter directly to a RowSelectionBacking::Mask

In fact, @XiangpengHao even has some (relatively crazy) techniques to combine masks quickly in #6624 (comment)

let remaining_records = max_records - total_records_read;
let remaining_levels = self.num_buffered_values - self.num_decoded_values;

if self.synthetic_page {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for the synthetic page -- it seems like a workaround for some case that should be handled in the control flow loop in ParquetRecordBatchReader::next_inner

Specifically, given skipping / scanning data pages works with the RowSelection approach, why does a mask approach cause additional problems? In some way the mask approach should actually decode more rows, not less (as then the filter is applied afterwards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough code review!
Yes—this is the trickiest part of the PR. When no pages are skipped, everything works as expected. But some pages can be skipped during row-group construction, use the Sparse ColumnChunkData, meaning their values and definition/repetition levels are never read. Row selection still works because skip_records() handles this case and skips the page accordingly.

However, with the Boolean-array design, all values must be read and decoded before filtering. ParquetRecordBatchReader is a streaming reader; it has no concept of pages, so we can’t rely on page size to drive skipping there. I think the most practical approach, therefore, is to return dummy null values as placeholders for the skipped pages. If I missed something or there's better way to do so, just let me know. 😊

A simple example:

the page size is 2, the mask is 100001, row selection should be read(1) skip(4) read(1)
the ColumnChunkData would be page1(10), page2(skipped), page3(01)
Using the rowselection to skip(4), the page2 won't be read at all.
But using the bit mask, we need all 6 value be read, but the page2 is not in the memory, which is why I need to construct this synthetic page.


For completeness, I prototyped reconstructing the readers to handle skipped pages directly, but it introduces a breaking change: every array_reader would need a page-size parameter. That’s undesirable—users shouldn’t need page-level details just to read Parquet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am similarly still confused.

@hhhizzz your explanation makes sense to me in theory, but I just tested out removing the synthetic page code from this PR and the tests all still seem to pass. So that means we either have a testing gap or there is something else going on:

I looked more carefully, and it seems to me that the calculation of what pages to fetch is still based on RowSelection (not the RowSelectionCursor / RowSelectionBacking):

pub(crate) async fn fetch<T: AsyncFileReader + Send>(
&mut self,
input: &mut T,
projection: &ProjectionMask,
selection: Option<&RowSelection>,
batch_size: usize,
cache_mask: Option<&ProjectionMask>,
) -> Result<()> {
// Figure out what ranges to fetch
let FetchRanges {
ranges,
page_start_offsets,
} = self.fetch_ranges(projection, selection, batch_size, cache_mask);

Thus it does feel possible to have the the situation you explain where pages needed to evaluate the row selection weren't fetched 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error comes from my directing test on a parquet, let me added new tests for the scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about how to test this scenario and have some ideas (it is probably time to do some fuzz testing / testing with very selective predicates). I hope to help write some additional tests later this week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you alamb, it looks like there's still something unresolved for the PR. I'm going to resolve it in the next few days. At mean time I may update or rebase the branch multiple times. So I converted the PR into draft.
The things left are:

  1. Add benchmark for the different types of value to determine the final length to do the selection/bitmask converting
  2. Add some guidance or tool to draw the charts, then we can collect more statistics data from different platform.
  3. For the design of synthetic page, We all agree it's not a good idea, I need to find another method to handle the sparse page.
  4. Add new tests to test if the bitmask method can handle all kinds of skipped page in sparse column chunk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @hhhizzz -- this is super exciting and I will give top priority to reviewing this PR as you make changes.

// Some writers omit data pages for sparse column chunks and encode the gap
// as a reader-visible error. Use the metadata peek to synthesise a page of
// null definition levels so downstream consumers see consistent row counts.
self.try_create_synthetic_page(metadata)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very fragile and likely to result in weird record shredding bugs - https://github.com/apache/arrow-rs/pull/8733/files#r2483674920

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think it would imply that the predicate pushdown is "reversing" earlier forms of pushdown and relying on the IO implementation to have chosen to do a sparse read - this feels unfortunate

Comment on lines 630 to 640
if self.descr.max_rep_level() != 0 {
return Err(general_err!(
"cannot synthesise sparse page for column with repetition levels ({message})"
));
}

if self.descr.max_def_level() == 0 {
return Err(general_err!(
"cannot synthesise sparse page for required column ({message})"
));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would mean we error if we try to pushdown on a column with either repetition levels or a required column - this seems like quite a major regression

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look, whilst I think orchestrating this skipping at the RecordReader level does have a certain elegance, it runs into the issue that the masked selections aren't necessarily page-aligned.

By definition the mask selection strategy requests rows that weren't part of the original selection, the problem is that this could result in requesting rows for pages that we know are irrelevant. In some cases this just results in wasted IO, however, when using prefetching IO systems (such as AsyncParquetReader) this results in errors. The hack of creating empty pages I'm not a big fan of.

I think a better solution would be to ensure we only construct MaskChunk that don't cross page boundaries. Ideally this would be done on a per-leaf column basis, but tbh I suspect just doing it globally would probably work just fine.

Edit: If one was feeling fancy, one could ignore page boundaries where both pages were present in the original selection, although in practice I suspect this not to make a huge difference.

@hhhizzz hhhizzz marked this pull request as draft November 3, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet]Performance Degradation with RowFilter on Unsorted Columns due to Fragmented ReadPlan Adaptive Parquet Predicate Pushdown Evaluation

4 participants