Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 31, 2025

Which issue does this PR close?

Rationale for this change

Per #8742 (review) let's get crazy and see if manually unrolling the vlq decoder (hot loop) will help performance

What changes are included in this PR?

Manually unroll the loop

Are these changes tested?

CI passes . I am running benchmarks

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 31, 2025
@etseidl
Copy link
Contributor

etseidl commented Oct 31, 2025

If you want to go truly crazy there's https://arxiv.org/html/2403.06898v1 🤣 Pretty clever.

@etseidl
Copy link
Contributor

etseidl commented Oct 31, 2025

FWIW I tried using

// byte N
let byte = self.read_byte()?;
if byte & 0x80 == 0 {
    return Ok(in_progress | ((byte as u64) << N*7));
}
in_progress |= ((byte & 0x7f) as u64) << N*7;

as the unit of work and that worked a bit better on my laptop. No need to mask byte if we already know the MSB is 0.

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

Once I get some benchmark numbers I will rework to use your pattern @etseidl -- maybe even try with some const functions to avoid the repetition

@etseidl
Copy link
Contributor

etseidl commented Oct 31, 2025

Flamegraph showing the amount of time spent in read_vlq (about 24% of the total including drop time) with the latest parser. This is the same data as the "wide" metadata benchmark.

Screen Shot 2025-10-31 at 4 04 48 PM

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/vlq_opt (d67a9a1) to bac0cb5 diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_vlq_opt
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                             alamb_vlq_opt                          main
-----                             -------------                          ----
decode parquet metadata           1.00      9.9±0.03µs        ? ?/sec    1.05     10.4±0.78µs        ? ?/sec
decode parquet metadata (wide)    1.00     43.4±0.28ms        ? ?/sec    1.05     45.7±2.75ms        ? ?/sec
open(default)                     1.00      9.7±0.03µs        ? ?/sec    1.04     10.1±0.06µs        ? ?/sec
open(page index)                  1.14    199.4±1.13µs        ? ?/sec    1.00    174.7±0.81µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/vlq_opt (d67a9a1) to bac0cb5 diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_vlq_opt
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2025

🤖: Benchmark completed

Details

group                             alamb_vlq_opt                          main
-----                             -------------                          ----
decode parquet metadata           1.01      9.8±0.04µs        ? ?/sec    1.00      9.7±0.02µs        ? ?/sec
decode parquet metadata (wide)    1.02     43.1±0.75ms        ? ?/sec    1.00     42.4±0.60ms        ? ?/sec
open(default)                     1.03      9.7±0.05µs        ? ?/sec    1.00      9.5±0.03µs        ? ?/sec
open(page index)                  1.15    199.2±0.77µs        ? ?/sec    1.00    173.7±0.53µs        ? ?/sec

@alamb alamb closed this Nov 3, 2025
@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2025

benchmarks imply this is 15% slower for opening page index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants