Skip to content

Commit bac0cb5

Browse files
authored
Small optimization in Parquet varint decoder (#8742)
# Which issue does this PR close? - Part of #5853. # Rationale for this change Following the recent improvements in Thrift decoding, the percentage of time spent decoding LEB128 encoded integers has increased. # What changes are included in this PR? This PR modifies the varint decoder to first test for integers that can be encoded in a single byte (using zig-zag encoding, the maximum int that can be encoded is 63). Many of the fields in the Parquet footer (including all enum values) will be in this range, so optimizing for this frequent occurrence makes sense. # Are these changes tested? Should be covered by existing tests # Are there any user-facing changes? No
1 parent cd61ead commit bac0cb5

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

parquet/src/parquet_thrift.rs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,8 +276,13 @@ pub(crate) trait ThriftCompactInputProtocol<'a> {
276276

277277
/// Read a ULEB128 encoded unsigned varint from the input.
278278
fn read_vlq(&mut self) -> ThriftProtocolResult<u64> {
279-
let mut in_progress = 0;
280-
let mut shift = 0;
279+
// try the happy path first
280+
let byte = self.read_byte()?;
281+
if byte & 0x80 == 0 {
282+
return Ok(byte as u64);
283+
}
284+
let mut in_progress = (byte & 0x7f) as u64;
285+
let mut shift = 7;
281286
loop {
282287
let byte = self.read_byte()?;
283288
in_progress |= ((byte & 0x7F) as u64).wrapping_shl(shift);

0 commit comments

Comments
 (0)