Rewrite `ParquetRecordBatchStream` in terms of the PushDecoder #8159

alamb · 2025-08-15T19:47:13Z

Which issue does this PR close?

Part of Decouple IO and CPU operations in the Parquet Reader (push decoder) #7983
Part of [Epic] Parquet Reader Improvement Plan / Proposal - July 2025 #8000
closes Rewrite ParquetRecordBatchStream (async API) in terms of the PushDecoder #8677

I am also working on a blog post about this

Blog post about parquet push decoder #8035

TODOs

Rewrite test_cache_projection_excludes_nested_columns in terms of higher level APIs (refactor test_cache_projection_excludes_nested_columns to use high level APIs #8754)
Benchmarks
Benchmarks with DataFusion: TEST: use new async parquet reader datafusion#18385

Rationale for this change

A new ParquetPushDecoder was implemented here

Implement Push Parquet Decoder #7997

I need to refactor the async and sync readers to use the new push decoder in order to:

avoid the xkcd standards effect (aka there are now three control loops)
Prove that the push decoder works (by passing all the tests of the other two)
Set the stage for improving filter pushdown more with a single control loop

What changes are included in this PR?

Refactor the ParquetRecordBatchStream to use ParquetPushDecoder

Are these changes tested?

Yes, by the existing CI tests

I also ran several benchmarks, both in arrow-rs and in DataFusion and I do not see any substantial performance difference (as expected):

TEST: use new async parquet reader datafusion#18385

Are there any user-facing changes?

No

alamb · 2025-08-19T16:39:14Z

From my perspective, the goal of "show we can use push decoder to rewrite the async decoder" is now complete and I will pick this PR up again once we get the push decoder merged

# Which issue does this PR close? - Part of #8000 - closes #7983 # Rationale for this change This PR is the first part of separating IO and decode operations in the rust parquet decoder. Decoupling IO and CPU enables several important usecases: 1. Different IO patterns (e.g. not buffer the entire row group at once) 2. Different IO APIs e.g. use io_uring, or OpenDAL, etc. 3. Deliberate prefetching within a file 4. Avoid code duplication between the `ParquetRecordBatchStreamBuilder` and `ParquetRecordBatchReaderBuilder` # What changes are included in this PR? 1. Add new `ParquetDecoderBuilder`, and `ParquetDecoder` and tests It is effectively an explicit version of the state machine that is used in existing async reader (where the state machine is encoded as Rust `async` / `await` structures) # Are these changes tested? Yes -- there are extensive tests for the new code Note that this PR actually adds a **3rd** path for control flow (when I claim this will remove duplication!) In follow on PRs I will convert the existing readers to use this new pattern, similarly to the sequence I did for the metadata decoder: - #8080 - #8340 Here is a preview of a PR that consolidates the async reader to use the push decoder internally (and removes one duplicate): - #8159 - closes #8022 # Are there any user-facing changes? Yes, a new API, but now changes to the existing APIs --------- Co-authored-by: Matthijs Brobbel <[email protected]> Co-authored-by: Adrian Garcia Badaracco <[email protected]>

alamb · 2025-10-30T13:26:08Z

parquet/src/arrow/push_decoder/reader_builder/mod.rs

    }

    fn compute_cache_projection_inner(&self, filter: &RowFilter) -> Option<ProjectionMask> {
+        // Do not compute the projection mask if the predicate cache is disabled


this is the fix from the following PR applied to the push decoder (now that the paths are unified)

[Parquet] Avoid fetching multiple pages when the predicate cache is disabled #8554

alamb · 2025-10-30T13:29:28Z

parquet/src/arrow/async_reader/mod.rs

-    ) -> ReadResult<T> {
-        // TODO: calling build_array multiple times is wasteful
-
-        let meta = self.metadata.row_group(row_group_idx);


The stream reader has the same logic / algorithm, but now it uses the copy in the push decoder (which is based on this code) instead of this

alamb · 2025-10-30T14:33:26Z

parquet/src/arrow/push_decoder/mod.rs

+        Ok(decode_result)
+    }
+
+    /// Attempt to return the next [`ParquetRecordBatchReader`] or return what data is needed


this is a new API on the ParquetPushDecoder that is needed to implement the existing next_row_group API: https://docs.rs/parquet/latest/parquet/arrow/async_reader/struct.ParquetRecordBatchStream.html#method.next_row_group

alamb · 2025-10-30T14:34:46Z

parquet/src/arrow/async_reader/mod.rs

+/// buffering.
 ///
 /// [`Stream`]: https://docs.rs/futures/latest/futures/stream/trait.Stream.html
 pub struct ParquetRecordBatchStream<T> {


I am quite pleased with this -- ParquetRecordBatchStreamBuilder is now clearly separated into the IO handling piece request_state and the decoding piece, decoder

alamb · 2025-10-30T14:35:30Z

parquet/src/arrow/async_reader/mod.rs

+            let request_state = std::mem::replace(&mut self.request_state, RequestState::Done);
+            match request_state {
+                // No outstanding requests, proceed to setup next row group
+                RequestState::None { input } => {


This is now the core state machine of ParquetRecordBatchStream, and I am pleased it represents what is going on in a straightforward way: it alternates between decode and I/O

alamb · 2025-10-30T14:39:08Z

parquet/src/arrow/async_reader/mod.rs


-    #[tokio::test]
-    #[allow(deprecated)]
-    async fn test_in_memory_row_group_sparse() {


This test was introduced in the following PR by @thinkharderdev

Use Parquet OffsetIndex to prune IO with RowSelection #2473

I believe it is meant to verify the PageIndex is used to prune IO,

The reason I propose deleting this test is:

IO pruning based on PageIndex is covered in the newer io tests, for example

arrow-rs/parquet/tests/arrow_reader/io/async_reader.rs

Line 119 in 2eabb59

// Expect to see only data IO for one page for each column for each row group

This test is in terms of non public APIs (the ReaderFactory and InMemoryRowGroup) which don't reflect the requests that are actually made (the ranges are coalesced, for example, for each column's pages)

alamb · 2025-10-30T14:43:06Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_decode_with_async_reader (776ee65) to 2eabb59 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_decode_with_async_reader
Results will be posted here when complete

alamb · 2025-10-30T16:02:32Z

🤖: Benchmark completed

Details

group                                                                                                      alamb_test_decode_with_async_reader    main
-----                                                                                                      -----------------------------------    ----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01   1275.1±6.90µs        ? ?/sec    1.00   1263.0±5.38µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1284.8±3.38µs        ? ?/sec    1.06   1360.3±4.57µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1281.6±4.43µs        ? ?/sec    1.00  1270.9±15.87µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.02    502.9±2.84µs        ? ?/sec    1.00    495.1±4.97µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.00    660.5±1.99µs        ? ?/sec    1.02    671.2±1.66µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.03    514.0±5.17µs        ? ?/sec    1.00    499.6±2.55µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.05   596.6±18.90µs        ? ?/sec    1.00    566.0±9.59µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    728.4±4.34µs        ? ?/sec    1.02    742.3±9.53µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.03   599.7±18.70µs        ? ?/sec    1.00   582.3±18.12µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.10    251.0±2.74µs        ? ?/sec    1.00    229.2±2.33µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.03    257.6±1.03µs        ? ?/sec    1.00    250.0±0.65µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.06    258.9±4.13µs        ? ?/sec    1.00    243.4±2.87µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.03    374.3±6.05µs        ? ?/sec    1.00    364.9±7.92µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.00    345.9±0.81µs        ? ?/sec    1.01    348.0±2.12µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.01    330.3±2.00µs        ? ?/sec    1.00    328.3±2.53µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.02    379.3±2.42µs        ? ?/sec    1.00    372.7±2.66µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00   1086.4±7.87µs        ? ?/sec    1.03   1123.2±5.54µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    939.7±6.43µs        ? ?/sec    1.03    964.6±5.20µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00   1097.2±8.61µs        ? ?/sec    1.03  1131.5±25.20µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    405.1±1.91µs        ? ?/sec    1.10    446.6±3.50µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    587.3±3.99µs        ? ?/sec    1.08    632.0±4.05µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    417.4±4.34µs        ? ?/sec    1.10    460.1±4.97µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    161.0±0.34µs        ? ?/sec    1.26    203.4±0.84µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.00    299.9±0.80µs        ? ?/sec    1.14    343.1±1.32µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    166.7±0.61µs        ? ?/sec    1.25    209.0±0.65µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.6±0.51µs        ? ?/sec    1.53    118.7±0.42µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.00    259.4±0.86µs        ? ?/sec    1.16    299.8±0.93µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     83.9±1.18µs        ? ?/sec    1.47    123.4±1.15µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    741.8±4.22µs        ? ?/sec    1.00    740.6±3.95µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    585.0±7.34µs        ? ?/sec    1.00    584.4±3.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    747.4±4.27µs        ? ?/sec    1.00    749.7±9.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     63.0±5.71µs        ? ?/sec    1.11     69.9±5.93µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    251.6±1.67µs        ? ?/sec    1.00    252.0±2.10µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.00     70.3±5.53µs        ? ?/sec    1.02     71.9±4.85µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.01     94.9±0.58µs        ? ?/sec    1.00     94.4±0.24µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    235.5±0.86µs        ? ?/sec    1.00    235.1±0.74µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00    100.3±0.33µs        ? ?/sec    1.00    100.0±0.28µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.02      9.7±0.27µs        ? ?/sec    1.00      9.6±0.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.01    193.7±0.95µs        ? ?/sec    1.00    192.3±0.55µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.9±0.29µs        ? ?/sec    1.00     14.9±0.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    185.0±2.42µs        ? ?/sec    1.00    184.2±0.46µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    346.3±1.15µs        ? ?/sec    1.00    347.3±1.19µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.01    191.6±1.10µs        ? ?/sec    1.00    190.4±0.64µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     14.6±0.16µs        ? ?/sec    1.00     14.6±0.34µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.00    262.4±2.88µs        ? ?/sec    1.00    261.5±1.02µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.3±0.19µs        ? ?/sec    1.03     20.8±0.47µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    367.6±1.39µs        ? ?/sec    1.00    367.0±1.15µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    384.7±1.35µs        ? ?/sec    1.00    383.5±1.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    374.3±1.21µs        ? ?/sec    1.00    373.6±1.52µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     28.1±0.26µs        ? ?/sec    1.07     30.2±1.10µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.00    220.4±0.80µs        ? ?/sec    1.00    221.2±1.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     35.5±0.57µs        ? ?/sec    1.05     37.4±1.02µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    124.6±0.85µs        ? ?/sec    1.00    124.3±0.34µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.01    139.6±0.76µs        ? ?/sec    1.00    138.3±0.41µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    127.8±0.80µs        ? ?/sec    1.00    127.7±0.25µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    178.7±0.68µs        ? ?/sec    1.00    179.4±0.73µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    234.8±0.72µs        ? ?/sec    1.00    233.9±0.76µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    184.3±1.64µs        ? ?/sec    1.00    185.1±1.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01     78.3±0.45µs        ? ?/sec    1.00     77.2±0.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.01    180.9±0.76µs        ? ?/sec    1.00    178.3±0.80µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.03     83.7±0.35µs        ? ?/sec    1.00     81.1±0.56µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.01    136.8±0.58µs        ? ?/sec    1.00    135.4±1.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.02    215.5±0.98µs        ? ?/sec    1.00    211.4±1.21µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.02    143.3±0.68µs        ? ?/sec    1.00    141.2±0.85µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.01     74.8±0.41µs        ? ?/sec    1.00     73.7±0.41µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.01    178.5±0.76µs        ? ?/sec    1.00    176.1±0.74µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     77.1±0.38µs        ? ?/sec    1.01     77.8±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    111.2±0.36µs        ? ?/sec    1.00    111.8±0.35µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.03    122.8±0.51µs        ? ?/sec    1.00    119.2±0.98µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.01    114.9±0.40µs        ? ?/sec    1.00    114.0±0.46µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    167.4±0.72µs        ? ?/sec    1.00    167.4±0.67µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    210.0±0.99µs        ? ?/sec    1.00    209.1±1.41µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    173.0±1.39µs        ? ?/sec    1.00    172.4±0.41µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    201.2±0.55µs        ? ?/sec    1.01    203.8±1.18µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.01    225.3±1.56µs        ? ?/sec    1.00    223.2±0.53µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    207.8±0.78µs        ? ?/sec    1.01    209.9±0.87µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.01    144.9±2.07µs        ? ?/sec    1.00    143.6±0.46µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    192.9±0.82µs        ? ?/sec    1.00    192.5±2.81µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    148.7±0.59µs        ? ?/sec    1.01    149.8±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.04    110.6±1.54µs        ? ?/sec    1.00    106.8±1.40µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.04    176.7±1.18µs        ? ?/sec    1.00    170.1±0.74µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.07    122.0±1.22µs        ? ?/sec    1.00    113.7±1.26µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00    100.5±0.42µs        ? ?/sec    1.01    101.4±0.97µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    116.4±0.32µs        ? ?/sec    1.01    117.4±0.32µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00    103.1±0.39µs        ? ?/sec    1.02    104.9±0.29µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    137.4±0.96µs        ? ?/sec    1.03    140.8±0.94µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    193.0±1.51µs        ? ?/sec    1.01    194.7±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    141.4±0.39µs        ? ?/sec    1.03    145.5±0.36µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.01     44.3±0.15µs        ? ?/sec    1.00     44.1±0.22µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    143.8±0.57µs        ? ?/sec    1.00    143.4±0.84µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     48.8±0.28µs        ? ?/sec    1.00     48.6±0.29µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.02    103.5±0.41µs        ? ?/sec    1.00    101.8±0.27µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    176.3±0.67µs        ? ?/sec    1.01    177.3±1.79µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    107.8±0.41µs        ? ?/sec    1.00    107.3±1.08µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.00     38.1±0.17µs        ? ?/sec    1.00     38.3±0.18µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    140.7±0.54µs        ? ?/sec    1.00    140.7±0.43µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.00     43.2±0.30µs        ? ?/sec    1.00     43.1±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     96.8±0.29µs        ? ?/sec    1.02     98.4±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    107.9±0.36µs        ? ?/sec    1.03    111.2±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     99.1±0.32µs        ? ?/sec    1.02    100.9±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    126.6±0.31µs        ? ?/sec    1.02    128.9±0.33µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    174.1±0.45µs        ? ?/sec    1.04    180.8±0.61µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    131.0±0.46µs        ? ?/sec    1.02    133.8±0.43µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     26.3±0.20µs        ? ?/sec    1.00     26.4±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    124.6±0.69µs        ? ?/sec    1.01    125.4±0.63µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     30.7±0.43µs        ? ?/sec    1.00     30.7±0.30µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     82.9±0.54µs        ? ?/sec    1.03     85.0±1.15µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    156.8±0.54µs        ? ?/sec    1.01    158.1±0.47µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     89.5±8.04µs        ? ?/sec    1.00     89.7±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     18.0±0.21µs        ? ?/sec    1.00     17.9±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    116.3±1.42µs        ? ?/sec    1.05    122.5±0.64µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     24.6±0.31µs        ? ?/sec    1.03     25.4±0.44µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     83.1±0.76µs        ? ?/sec    1.00     83.2±0.25µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00     90.6±1.02µs        ? ?/sec    1.00     90.2±0.30µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     85.5±0.28µs        ? ?/sec    1.00     85.9±0.39µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    112.9±1.59µs        ? ?/sec    1.01    114.3±0.41µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    144.9±0.55µs        ? ?/sec    1.01    147.0±3.00µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    116.3±0.66µs        ? ?/sec    1.01    117.4±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    150.1±0.80µs        ? ?/sec    1.00    150.2±0.56µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.00    168.1±0.80µs        ? ?/sec    1.00    167.6±0.45µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    155.2±0.50µs        ? ?/sec    1.00    155.7±0.53µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     91.0±0.60µs        ? ?/sec    1.00     90.7±0.49µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.00    135.9±0.63µs        ? ?/sec    1.00    136.1±0.78µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     96.1±0.65µs        ? ?/sec    1.00     95.9±0.79µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.01     44.2±2.44µs        ? ?/sec    1.00     43.8±1.58µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.02    112.4±1.22µs        ? ?/sec    1.00    110.7±0.65µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     51.7±1.66µs        ? ?/sec    1.02     52.8±1.79µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     96.9±0.28µs        ? ?/sec    1.04    100.6±0.27µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00    111.5±0.41µs        ? ?/sec    1.02    114.2±0.49µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     99.7±1.25µs        ? ?/sec    1.03    103.0±0.35µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.00    130.4±0.40µs        ? ?/sec    1.02    133.3±1.20µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    184.1±0.44µs        ? ?/sec    1.01    186.3±0.82µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    135.7±0.40µs        ? ?/sec    1.02    138.3±1.29µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.01     36.4±0.17µs        ? ?/sec    1.00     36.2±0.13µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.00    134.5±0.30µs        ? ?/sec    1.01    135.6±0.39µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     40.5±0.28µs        ? ?/sec    1.01     40.8±0.37µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.01     95.3±0.29µs        ? ?/sec    1.00     94.4±0.28µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    167.7±0.44µs        ? ?/sec    1.01    168.5±1.89µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.01    100.6±0.28µs        ? ?/sec    1.00     99.5±0.35µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.5±0.11µs        ? ?/sec    1.00     30.4±0.33µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    131.9±0.45µs        ? ?/sec    1.01    132.8±1.54µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.01     35.3±0.26µs        ? ?/sec    1.00     35.1±0.11µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.02      7.3±0.12ms        ? ?/sec    1.00      7.2±0.12ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.07     14.1±1.21ms        ? ?/sec    1.00     13.2±0.68ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.05    518.6±3.60µs        ? ?/sec    1.00    491.7±2.86µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    660.3±2.46µs        ? ?/sec    1.01    670.2±1.67µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.04    517.5±9.53µs        ? ?/sec    1.00    499.3±3.62µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.08   741.0±12.05µs        ? ?/sec    1.00   686.1±18.62µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.02    805.0±7.98µs        ? ?/sec    1.00    785.6±3.64µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.08    744.5±6.33µs        ? ?/sec    1.00    690.6±8.71µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.00    303.3±0.99µs        ? ?/sec    1.01    306.4±4.16µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.09    392.3±3.26µs        ? ?/sec    1.00    358.6±4.71µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.00    308.0±1.35µs        ? ?/sec    1.02    313.5±3.79µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.05    252.6±2.31µs        ? ?/sec    1.00    240.1±2.38µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.06    258.1±1.31µs        ? ?/sec    1.00    242.9±0.64µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.11    259.7±2.51µs        ? ?/sec    1.00    233.6±2.42µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.02    466.7±5.49µs        ? ?/sec    1.00    457.0±5.33µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.01    374.5±1.91µs        ? ?/sec    1.00    369.9±5.35µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.02    476.2±4.13µs        ? ?/sec    1.00    467.8±4.56µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    109.8±0.17µs        ? ?/sec    1.05    115.5±0.27µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    122.2±0.68µs        ? ?/sec    1.01    123.3±0.39µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    112.0±0.24µs        ? ?/sec    1.06    118.3±0.32µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    149.2±0.35µs        ? ?/sec    1.06    157.5±0.34µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    200.7±0.68µs        ? ?/sec    1.01    203.7±0.65µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    154.2±0.65µs        ? ?/sec    1.06    162.9±0.56µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.01     44.5±0.17µs        ? ?/sec    1.00     44.0±0.27µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.02    144.7±0.69µs        ? ?/sec    1.00    141.5±0.72µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     48.6±0.15µs        ? ?/sec    1.00     48.5±0.54µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.01    103.1±0.22µs        ? ?/sec    1.00    102.1±0.42µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    177.0±0.68µs        ? ?/sec    1.00    176.5±0.78µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.01    108.3±0.57µs        ? ?/sec    1.00    107.3±0.33µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.00     38.3±0.15µs        ? ?/sec    1.00     38.3±0.19µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.00    140.6±0.44µs        ? ?/sec    1.00    140.2±0.31µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     42.9±0.20µs        ? ?/sec    1.01     43.2±0.15µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     98.9±0.45µs        ? ?/sec    1.01     99.5±0.36µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00    111.6±0.32µs        ? ?/sec    1.00    111.0±0.27µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00    101.1±0.42µs        ? ?/sec    1.00    101.4±0.26µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.00    129.3±0.47µs        ? ?/sec    1.00    129.6±2.06µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    180.5±1.15µs        ? ?/sec    1.00    181.4±1.72µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.00    134.0±1.40µs        ? ?/sec    1.00    134.2±1.21µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     26.3±0.16µs        ? ?/sec    1.01     26.5±0.23µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.02    125.8±0.73µs        ? ?/sec    1.00    123.8±0.57µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     29.1±0.25µs        ? ?/sec    1.06     31.0±0.23µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.01     86.7±0.46µs        ? ?/sec    1.00     85.5±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    158.9±0.54µs        ? ?/sec    1.00    158.5±0.60µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.01     91.2±0.39µs        ? ?/sec    1.00     89.9±0.51µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.02     21.2±0.43µs        ? ?/sec    1.00     20.8±0.24µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00    122.7±0.45µs        ? ?/sec    1.01    123.5±1.32µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     25.9±0.40µs        ? ?/sec    1.01     26.1±0.57µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.00     83.5±0.24µs        ? ?/sec    1.00     83.4±0.26µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00     90.4±0.39µs        ? ?/sec    1.00     90.5±0.27µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     85.9±0.40µs        ? ?/sec    1.00     86.0±0.38µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    111.5±0.46µs        ? ?/sec    1.02    114.1±0.64µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    144.3±0.47µs        ? ?/sec    1.01    145.5±0.36µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    114.9±3.23µs        ? ?/sec    1.01    115.7±0.36µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.00    149.6±0.91µs        ? ?/sec    1.00    150.3±0.83µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.00    168.4±0.85µs        ? ?/sec    1.00    169.1±1.29µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.00    154.6±0.49µs        ? ?/sec    1.00    155.0±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.02     91.8±0.38µs        ? ?/sec    1.00     89.7±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.00    135.7±0.63µs        ? ?/sec    1.00    135.7±0.43µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.02     96.1±0.54µs        ? ?/sec    1.00     94.2±0.34µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.05     46.3±1.76µs        ? ?/sec    1.00     44.1±1.80µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.00    111.4±0.54µs        ? ?/sec    1.00    111.5±0.45µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.04     54.1±2.07µs        ? ?/sec    1.00     51.9±2.15µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.01    104.2±0.30µs        ? ?/sec    1.00    103.1±0.27µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.01    116.3±0.58µs        ? ?/sec    1.00    115.7±0.28µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.01    106.9±0.33µs        ? ?/sec    1.00    105.8±0.55µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.01    139.6±0.40µs        ? ?/sec    1.00    137.7±0.36µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    190.1±0.37µs        ? ?/sec    1.00    189.9±0.52µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.01    144.1±0.41µs        ? ?/sec    1.00    142.8±0.45µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.07     36.7±0.30µs        ? ?/sec    1.00     34.4±0.09µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.00    133.0±0.29µs        ? ?/sec    1.02    135.3±0.84µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     39.1±0.18µs        ? ?/sec    1.04     40.7±0.38µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.01     95.2±0.25µs        ? ?/sec    1.00     94.3±0.21µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    167.8±0.47µs        ? ?/sec    1.00    168.1±0.52µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.01    100.4±0.23µs        ? ?/sec    1.00     99.1±0.22µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.01     30.5±0.20µs        ? ?/sec    1.00     30.2±0.28µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.00    132.1±0.52µs        ? ?/sec    1.01    133.7±4.65µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.01     35.4±0.40µs        ? ?/sec    1.00     34.9±0.38µs        ? ?/sec

alamb · 2025-10-30T16:02:36Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_decode_with_async_reader (18038b2) to 2eabb59 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_decode_with_async_reader
Results will be posted here when complete

alamb · 2025-10-30T16:28:13Z

🤖: Benchmark completed

Details

group                                alamb_test_decode_with_async_reader    main
-----                                -----------------------------------    ----
arrow_reader_clickbench/async/Q1     1.00      2.4±0.04ms        ? ?/sec    1.01      2.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     15.3±1.01ms        ? ?/sec    1.00     15.3±0.98ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.01     16.6±0.83ms        ? ?/sec    1.00     16.5±0.96ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     29.0±1.01ms        ? ?/sec    1.03     29.8±0.77ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     40.4±0.93ms        ? ?/sec    1.02     41.1±1.12ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     38.2±0.76ms        ? ?/sec    1.01     38.6±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      6.0±0.35ms        ? ?/sec    1.01      6.1±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    121.5±2.01ms        ? ?/sec    1.02    123.7±1.80ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    143.1±3.70ms        ? ?/sec    1.01    143.9±2.07ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.01   304.4±11.53ms        ? ?/sec    1.00    301.2±9.41ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.02   434.6±15.95ms        ? ?/sec    1.00    427.5±8.20ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.02     47.9±1.84ms        ? ?/sec    1.00     47.1±1.60ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    105.6±0.71ms        ? ?/sec    1.03    108.7±1.76ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    108.3±2.01ms        ? ?/sec    1.02    110.0±2.02ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     55.0±1.18ms        ? ?/sec    1.01     55.5±0.96ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    128.0±2.59ms        ? ?/sec    1.02    130.6±2.21ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00    101.7±1.56ms        ? ?/sec    1.02    104.2±2.49ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.6±0.81ms        ? ?/sec    1.00     38.4±0.49ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     50.5±1.94ms        ? ?/sec    1.02     51.6±2.29ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     47.3±1.04ms        ? ?/sec    1.00     47.1±1.03ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     37.2±1.02ms        ? ?/sec    1.00     36.9±0.98ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     13.8±0.30ms        ? ?/sec    1.02     14.0±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.01     10.2±0.19ms        ? ?/sec    1.00     10.1±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     12.0±0.31ms        ? ?/sec    1.01     12.0±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     38.9±0.77ms        ? ?/sec    1.04     40.5±1.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     50.7±0.96ms        ? ?/sec    1.02     51.7±1.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     48.5±1.16ms        ? ?/sec    1.01     48.8±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.4±0.15ms        ? ?/sec    1.00      4.4±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    180.2±4.66ms        ? ?/sec    1.02    183.1±2.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    238.7±3.53ms        ? ?/sec    1.03    245.9±6.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00   496.4±11.74ms        ? ?/sec    1.03   513.4±12.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   449.8±16.26ms        ? ?/sec    1.04   466.9±17.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     52.2±0.97ms        ? ?/sec    1.04     54.4±2.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    156.7±3.65ms        ? ?/sec    1.03    161.9±5.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    157.1±4.84ms        ? ?/sec    1.00    157.4±3.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     52.6±0.75ms        ? ?/sec    1.01     53.0±0.83ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    158.6±3.41ms        ? ?/sec    1.03    162.6±5.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     92.8±1.98ms        ? ?/sec    1.00     93.1±2.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     30.1±0.34ms        ? ?/sec    1.01     30.3±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     36.3±1.32ms        ? ?/sec    1.00     36.2±1.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.3±1.59ms        ? ?/sec    1.00     45.5±2.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.02     34.7±1.30ms        ? ?/sec    1.00     34.2±1.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.8±0.20ms        ? ?/sec    1.03     13.1±0.28ms        ? ?/sec

alamb · 2025-10-30T16:28:17Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_decode_with_async_reader (18038b2) to 2eabb59 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_decode_with_async_reader
Results will be posted here when complete

alamb · 2025-10-30T16:41:21Z

🤖: Benchmark completed

Details

group                                                                                alamb_test_decode_with_async_reader    main
-----                                                                                -----------------------------------    ----
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.02  1751.8±42.92µs        ? ?/sec    1.00  1723.3±13.62µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.00      2.1±0.06ms        ? ?/sec    1.04      2.1±0.15ms        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.01  1614.2±64.28µs        ? ?/sec    1.00  1599.6±30.29µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.02  1713.7±68.14µs        ? ?/sec    1.00  1682.4±36.93µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00  1558.3±40.62µs        ? ?/sec    1.03  1603.8±123.68µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.00  1931.6±51.92µs        ? ?/sec    1.03  1987.9±82.95µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.02  1404.8±60.79µs        ? ?/sec    1.00  1380.1±24.55µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.00  1499.9±56.99µs        ? ?/sec    1.03  1537.5±83.36µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.03  1778.1±94.76µs        ? ?/sec    1.00  1731.6±23.27µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.01      2.1±0.15ms        ? ?/sec    1.00      2.1±0.08ms        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.01  1601.1±30.04µs        ? ?/sec    1.00  1584.7±36.91µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.00  1677.1±35.46µs        ? ?/sec    1.00  1670.7±29.88µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00   985.3±29.96µs        ? ?/sec    1.00   981.0±48.23µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.00    997.3±7.15µs        ? ?/sec    1.03  1023.7±34.83µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00   867.7±13.82µs        ? ?/sec    1.02   882.8±10.70µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.01  1007.5±10.26µs        ? ?/sec    1.00    995.5±8.37µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.00      4.1±0.07ms        ? ?/sec    1.02      4.2±0.13ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.00      4.2±0.08ms        ? ?/sec    1.00      4.1±0.15ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.00      3.7±0.08ms        ? ?/sec    1.00      3.6±0.08ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.00      3.6±0.08ms        ? ?/sec    1.01      3.6±0.13ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.00  1977.7±43.60µs        ? ?/sec    1.05      2.1±0.11ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.01      2.3±0.10ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00  1831.8±43.93µs        ? ?/sec    1.02  1871.9±64.94µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.00  1942.4±50.74µs        ? ?/sec    1.03      2.0±0.06ms        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.00  1304.0±53.55µs        ? ?/sec    1.02  1325.0±66.96µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.00  1445.3±54.41µs        ? ?/sec    1.04  1497.9±112.21µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1172.4±28.85µs        ? ?/sec    1.02  1192.2±26.91µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.00  1314.7±50.16µs        ? ?/sec    1.00  1317.3±42.30µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.00      4.3±0.06ms        ? ?/sec    1.00      4.2±0.06ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.00      5.0±0.11ms        ? ?/sec    1.01      5.0±0.12ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      3.6±0.06ms        ? ?/sec    1.02      3.6±0.11ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.01      3.6±0.17ms        ? ?/sec    1.00      3.5±0.09ms        ? ?/sec

alamb · 2025-10-31T11:01:48Z

parquet/src/arrow/push_decoder/mod.rs

 impl ParquetDecoderState {
+    /// If actively reading a RowGroup, return the currently active
+    /// ParquetRecordBatchReader and advance to the next group.
+    fn try_next_reader(


This is a newly added "batched" API that makes it possible to read the next reader (that is ready to produce record batches)

Is this so that we can preserve the pub async fn next_row_group(&mut self) -> Result<Option<ParquetRecordBatchReader>> API on the async reader?

Yes, exactly.

It actually turns out that is a pretty clever API that I didn't know about -- it lets one interleave IO and CPU more easily:

[Parquet] Pre-fetch the next row group when reading parquet files datafusion#18470

alamb · 2025-10-31T11:02:59Z

parquet/src/arrow/push_decoder/mod.rs

+    ///
+    /// This function is called in a loop until the decoder is ready to return
+    /// data (has the required pages buffered) or is finished.
+    fn transition(self) -> Result<(Self, DecodeResult<()>), ParquetError> {


reworked so it can be shared between try_next_batch and try_next_reader

It also now avoids a self-recursive call which I think is a (minor) improvement

alamb · 2025-10-31T11:04:17Z

parquet/src/arrow/async_reader/mod.rs

    ///
    /// See examples on [`ParquetRecordBatchStreamBuilder::new`]
    pub fn build(self) -> Result<ParquetRecordBatchStream<T>> {
-        let num_row_groups = self.metadata.row_groups().len();


The whole point of this PR is to remove all this code (and instead use the copy in the push decoder)

alamb · 2025-10-31T11:04:58Z

parquet/src/arrow/async_reader/mod.rs

+
+        let request_state = RequestState::None { input: input.0 };
+
+        Ok(ParquetRecordBatchStream {


You can see the Stream is much simpler now -- only the decoder and an object to track the current I/O state

alamb · 2025-10-31T12:45:09Z

This PR is now pretty much ready for review. It builds on a test refactor here:

refactor test_cache_projection_excludes_nested_columns to use high level APIs #8754

Once that is merged I will mark this one ready for review

…level APIs (#8754) # Which issue does this PR close? - Related to #8677 - part of #8159 # Rationale for this change I am reworking how the parquet decoder's state machine works in #8159 One of the unit tests, `test_cache_projection_excludes_nested_columns` uses non-public APIs that I am changing Rather than rewrite them into other non public APIs I think it would be better if this test is in terms of public APIs # What changes are included in this PR? 1. refactor `test_cache_projection_excludes_nested_columns` to use high level APIs # Are these changes tested? They are run in CI I also verified this test covers the intended functionality by commenting it out: ```diff --- a/parquet/src/arrow/async_reader/mod.rs +++ b/parquet/src/arrow/async_reader/mod.rs @@ -724,7 +724,9 @@ where cache_projection.union(predicate.projection()); } cache_projection.intersect(projection); - self.exclude_nested_columns_from_cache(&cache_projection) + // TEMP don't exclude nested columns + //self.exclude_nested_columns_from_cache(&cache_projection) + Some(cache_projection) } /// Exclude leaves belonging to roots that span multiple parquet leaves (i.e. nested columns) ``` And then running the test: ```shell cargo test --all-features --test arrow_reader ``` And the test fails (as expected) ``` ---- predicate_cache::test_cache_projection_excludes_nested_columns stdout ---- thread 'predicate_cache::test_cache_projection_excludes_nested_columns' panicked at parquet/tests/arrow_reader/predicate_cache.rs:244:9: assertion `left == right` failed: Expected 0 records read from cache, but got 100 left: 100 right: 0 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: predicate_cache::test_cache_projection_excludes_nested_columns test result: FAILED. 88 passed; 1 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.20s ``` # Are there any user-facing changes? No, this is only test changes

alamb · 2025-10-31T21:28:36Z

parquet/src/util/push_buffers.rs

    }

    fn get_bytes(&self, start: u64, length: usize) -> Result<Bytes, ParquetError> {
-        if start > self.file_len {


the async decoder doesn't know (or need to know) the entire file length so I removed this somewhat more specific error message and instead will rely on the underlying source reporting errors when appropriate

vustef

Thank you Andrew. This is my first review in the codebase, but fwiw, this looks good to me.

parquet/src/arrow/async_reader/mod.rs

vustef · 2025-11-03T09:43:35Z

parquet/src/arrow/async_reader/mod.rs

-
-        if let Some(limit) = &mut self.limit {
-            *limit -= rows_after;
+    // Issue a request to fetch a single range, returining the Outstanding state


If it's to fetch a single range, why does it take Vec<Range<u64>> as a parameter?
I suppose the comment is wrong, not the parameter.

Fixed in ee64444

vustef · 2025-11-03T09:45:05Z

parquet/src/arrow/async_reader/mod.rs

+        // (aka can have references internally) and thus must
+        // own the input while the request is outstanding.
+        let future = async move {
+            let data = input.get_byte_ranges(ranges_captured).await?;


An aside: I don't understand why the default implementation for the AsyncReader fetches range by range sequentially instead of utilizing concurrency of the underlying runtime:

/// Retrieve multiple byte ranges. The default implementation will call `get_bytes` sequentially

Please let me know if it doesn't resonate with you either and I can open an issue for that.

An aside: I don't understand why the default implementation for the AsyncReader fetches range by range sequentially instead of utilizing concurrency of the underlying runtime:

I think you are referring to this:

arrow-rs/parquet/src/arrow/async_reader/mod.rs

Lines 79 to 91 in db876a9

fn get_byte_ranges(&mut self, ranges: Vec<Range<u64>>) -> BoxFuture<'_, Result<Vec<Bytes>>> {

async move {

let mut result = Vec::with_capacity(ranges.len());

for range in ranges.into_iter() {

let data = self.get_bytes(range).await?;

result.push(data);

}

Ok(result)

}

.boxed()

}

I think one reason is that the concurrency model is different depending on the runtime (e.g. the way you launch concurrent IO using tokio is different than how you launch concurrent tasks for io_uring, for example). Also there may be benefits to doing larger swaths of IO -- e.g. S3 doesn't actually support multiple ranges in a single requests

So in my mind the way "utilizing concurrency of the underlying runtime:" is achieved is by providing an implementation of AsyncFileReader with the appropriate specialization for get_ranges.

One thing we could consider, FWIW, is to remove the default implementation which would force each impl to specialize get_ranges 🤔

BTW One of my primary motivations for extracting the parquet state machine into ParquetPushDecoder is precisely to make it easier to do such specialized IO. I have plans to write a blog post about this topic, but it will probably take me another month or so

parquet/src/arrow/async_reader/mod.rs

vustef · 2025-11-03T09:58:10Z

parquet/src/arrow/async_reader/mod.rs

-                    } else {
-                        // All rows skipped, read next row group
-                        continue;
+            let request_state = std::mem::replace(&mut self.request_state, RequestState::Done);


Why was this ownership trick needed? Perhaps you could comment in the code?

It is the way I could get the Rust ownership rules to be happy (aka ensure that self.request_state always has a valid value and can't be in some partial state). I have added a comment in 3ec7448

vustef · 2025-11-03T10:10:07Z

parquet/src/arrow/push_decoder/mod.rs

 impl ParquetDecoderState {
+    /// If actively reading a RowGroup, return the currently active
+    /// ParquetRecordBatchReader and advance to the next group.
+    fn try_next_reader(


Is this so that we can preserve the pub async fn next_row_group(&mut self) -> Result<Option<ParquetRecordBatchReader>> API on the async reader?

vustef · 2025-11-03T10:26:07Z

parquet/src/arrow/async_reader/mod.rs

+        let decoder = ParquetPushDecoderBuilder {
+            // Async reader doesn't know the overall size of the input, but it
+            // is not required for decoding as we will already have the metadata
+            input: 0,


I missed the previous PR, but am a bit confused with the input field of ArrowReaderBuilder. Is it meant to represent arbitrary input, specific to a specialized type (in this case file_length for ParquetPushDecoderBuilder)? I wonder if it would've been better if we had something like this:

struct ParquetPushDecoderBuilder { reader_builder::ArrowReaderBuilder file_len::ut6 }

Just a thought, not intended to be addressed here.

I wonder if it would've been better if we had something like this:

I agree this would be much cleaner.

The input field is confusing in the context of the "push decoder" as there is (by design) no input.

However, the current structure is designed so the exact same builder code can be shared for the three different decoder types. Using an ArrowReaderBuilder internally is an interesting idea, but we would need to find some way to pass along options (either by duplicating methods from ArrowReaderBuilder to pass through, or constructing the push decoder builder from the ArrowReaderBuilder)

However, I will try and change the type from u64 to some new type where this context can be commented rather than have this strange 0

I actually removed the u64 from the push decoder builder and I think the code is much nicer now.

dd683bd

Thank you for the suggestion @vustef

Co-authored-by: Vukasin Stefanovic <[email protected]>

…h_async_reader

alamb · 2025-11-04T21:53:21Z

Thank you very much for the review @vustef

alamb · 2025-11-04T21:55:23Z

parquet/src/arrow/push_decoder/mod.rs

-            parquet_metadata,
-            ArrowReaderOptions::default(),
-        )
+    pub fn try_new_decoder(parquet_metadata: Arc<ParquetMetaData>) -> Result<Self, ParquetError> {


This function was introduced in

Implement Push Parquet Decoder #7997

Which has not been release yet -- and thus this is not a breaking API change. Likewise for the changes to ParquetPushDecoderBuilder

github-actions bot added the parquet Changes to the parquet crate label Aug 15, 2025

alamb mentioned this pull request Aug 15, 2025

Implement Push Parquet Decoder #7997

Merged

This comment was marked as outdated.

Sign in to view

alamb mentioned this pull request Aug 19, 2025

Decouple IO and CPU operations in the Parquet Reader (push decoder) #7983

Closed

alamb force-pushed the alamb/test_decode_with_async_reader branch from 314dc5c to f0f79dc Compare October 29, 2025 13:01

alamb changed the title ~~WIP: Rewrite ParquetRecordBatchStream in terms of the PushDecoder~~ Rewrite ParquetRecordBatchStream in terms of the PushDecoder Oct 29, 2025

alamb force-pushed the alamb/test_decode_with_async_reader branch from 88fe84b to 7fe9fa6 Compare October 29, 2025 16:40

alamb commented Oct 30, 2025

View reviewed changes

alamb force-pushed the alamb/test_decode_with_async_reader branch 2 times, most recently from 3b7837d to 776ee65 Compare October 30, 2025 14:27

alamb commented Oct 30, 2025

View reviewed changes

alamb mentioned this pull request Oct 30, 2025

TEST: use new async parquet reader apache/datafusion#18385

Draft

alamb force-pushed the alamb/test_decode_with_async_reader branch from 776ee65 to 18038b2 Compare October 30, 2025 14:55

This was referenced Oct 30, 2025

TEST prefetching Row Groups using next_reader API in parquet-rs apache/datafusion#18391

Draft

refactor test_cache_projection_excludes_nested_columns to use high level APIs #8754

Merged

alamb force-pushed the alamb/test_decode_with_async_reader branch from 18038b2 to 5c4dbc0 Compare October 31, 2025 12:42

alamb commented Oct 31, 2025

View reviewed changes

Rework async reader to use decoder

73a16cf

alamb force-pushed the alamb/test_decode_with_async_reader branch from 5c4dbc0 to 73a16cf Compare October 31, 2025 21:25

alamb commented Oct 31, 2025

View reviewed changes

alamb marked this pull request as ready for review October 31, 2025 21:29

vustef mentioned this pull request Nov 3, 2025

Support file row number in Parquet reader #7299

Open

vustef approved these changes Nov 3, 2025

View reviewed changes

alamb mentioned this pull request Nov 4, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-03 apache/datafusion#18486

Open

36 tasks

alamb and others added 5 commits November 4, 2025 15:42

Apply suggestions from code review

db876a9

Co-authored-by: Vukasin Stefanovic <[email protected]>

Merge remote-tracking branch 'apache/main' into alamb/test_decode_wit…

3d6a204

…h_async_reader

Explain use of std::mem::replace

3ec7448

Fix comment

ee64444

Remove file_length from the ParquetPushDecoderBuilder

dd683bd

alamb commented Nov 4, 2025

View reviewed changes

Fix comments

53cd99d


		let request_state = RequestState::None { input: input.0 };

		Ok(ParquetRecordBatchStream {

	fn get_byte_ranges(&mut self, ranges: Vec<Range<u64>>) -> BoxFuture<'_, Result<Vec<Bytes>>> {
	async move {
	let mut result = Vec::with_capacity(ranges.len());

	for range in ranges.into_iter() {
	let data = self.get_bytes(range).await?;
	result.push(data);
	}

	Ok(result)
	}
	.boxed()
	}

Rewrite ParquetRecordBatchStream in terms of the PushDecoder #8159

Are you sure you want to change the base?

Rewrite ParquetRecordBatchStream in terms of the PushDecoder #8159

Uh oh!

Conversation

alamb commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

TODOs

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

alamb commented Aug 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vustef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rewrite `ParquetRecordBatchStream` in terms of the PushDecoder #8159

Rewrite `ParquetRecordBatchStream` in terms of the PushDecoder #8159

alamb commented Aug 15, 2025 •

edited

Loading

alamb Oct 30, 2025 •

edited

Loading

alamb Oct 30, 2025 •

edited

Loading