PROTOTYPE: manually unroll vlq decoder loop #8757

alamb · 2025-10-31T14:02:47Z

Which issue does this PR close?

Follow on to Small optimization in Parquet varint decoder #8742 from @etseidl

Rationale for this change

Per #8742 (review) let's get crazy and see if manually unrolling the vlq decoder (hot loop) will help performance

What changes are included in this PR?

Manually unroll the loop

Are these changes tested?

CI passes . I am running benchmarks

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

etseidl · 2025-10-31T16:33:41Z

If you want to go truly crazy there's https://arxiv.org/html/2403.06898v1 🤣 Pretty clever.

etseidl · 2025-10-31T17:33:15Z

FWIW I tried using

// byte N
let byte = self.read_byte()?;
if byte & 0x80 == 0 {
    return Ok(in_progress | ((byte as u64) << N*7));
}
in_progress |= ((byte & 0x7f) as u64) << N*7;

as the unit of work and that worked a bit better on my laptop. No need to mask byte if we already know the MSB is 0.

alamb · 2025-10-31T21:32:03Z

Once I get some benchmark numbers I will rework to use your pattern @etseidl -- maybe even try with some const functions to avoid the repetition

etseidl · 2025-10-31T23:13:07Z

Flamegraph showing the amount of time spent in read_vlq (about 24% of the total including drop time) with the latest parser. This is the same data as the "wide" metadata benchmark.

alamb · 2025-10-31T23:26:05Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/vlq_opt (d67a9a1) to bac0cb5 diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_vlq_opt
Results will be posted here when complete

alamb · 2025-10-31T23:28:24Z

🤖: Benchmark completed

Details

group                             alamb_vlq_opt                          main
-----                             -------------                          ----
decode parquet metadata           1.00      9.9±0.03µs        ? ?/sec    1.05     10.4±0.78µs        ? ?/sec
decode parquet metadata (wide)    1.00     43.4±0.28ms        ? ?/sec    1.05     45.7±2.75ms        ? ?/sec
open(default)                     1.00      9.7±0.03µs        ? ?/sec    1.04     10.1±0.06µs        ? ?/sec
open(page index)                  1.14    199.4±1.13µs        ? ?/sec    1.00    174.7±0.81µs        ? ?/sec

alamb · 2025-11-01T10:12:42Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/vlq_opt (d67a9a1) to bac0cb5 diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_vlq_opt
Results will be posted here when complete

alamb · 2025-11-01T10:15:47Z

🤖: Benchmark completed

Details

group                             alamb_vlq_opt                          main
-----                             -------------                          ----
decode parquet metadata           1.01      9.8±0.04µs        ? ?/sec    1.00      9.7±0.02µs        ? ?/sec
decode parquet metadata (wide)    1.02     43.1±0.75ms        ? ?/sec    1.00     42.4±0.60ms        ? ?/sec
open(default)                     1.03      9.7±0.05µs        ? ?/sec    1.00      9.5±0.03µs        ? ?/sec
open(page index)                  1.15    199.2±0.77µs        ? ?/sec    1.00    173.7±0.53µs        ? ?/sec

alamb · 2025-11-03T13:30:42Z

benchmarks imply this is 15% slower for opening page index

PROTOTYPE: manually unroll vlq decoder

205d39d

github-actions bot added the parquet Changes to the parquet crate label Oct 31, 2025

cmt

d67a9a1

alamb mentioned this pull request Oct 31, 2025

Small optimization in Parquet varint decoder #8742

Merged

alamb closed this Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PROTOTYPE: manually unroll vlq decoder loop #8757

PROTOTYPE: manually unroll vlq decoder loop #8757

Uh oh!

alamb commented Oct 31, 2025 •

edited

Loading

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PROTOTYPE: manually unroll vlq decoder loop #8757

PROTOTYPE: manually unroll vlq decoder loop #8757

Uh oh!

Conversation

alamb commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

etseidl commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alamb commented Oct 31, 2025 •

edited

Loading

alamb commented Nov 3, 2025 •

edited

Loading