-
Notifications
You must be signed in to change notification settings - Fork 194
Improve enum decoding -- alternative with binary search instead of map lookup #985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an alternative to google#980 that doesn't make a big difference in terms of performance of AOT compiled benchmarks, but makes a big difference when compiling to Wasm, comapred to google#980. When decoding an enum value, we call a callback in the enum field's `FieldInfo`. The callback then indexes a map mapping enum numbers to Dart values. When these conditions hold: - The known enum numbers are all positive. (so that we can use a list and index it with the number) - The known enum numbers are more than 70% of the large known enum number. (so that the list won't have a lot of `null` entries, wasting heap space) We now generate a list instead of a map to map enum numbers to Dart values. Note: similar to the map, the list is runtime allocated. No new code generated per message or enum type. Wasm benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 48200.0 us` - PR google#980: `protobuf_PackedEnumDecoding(RunTimeRaw): 42120.0 us` - Diff: -12.6% - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 35733.3 us` - Diff against PR google#980: -15% - Diff against master: -25% AOT benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 49180.0 us` - PR google#980: `protobuf_PackedEnumDecoding(RunTimeRaw): 45726.82 us` - Diff: -7% - This PR: `protobuf_PackedEnumDecoding(RunTimeRaw): 42929.7 us` - Diff against PR google#980: -6% - Diff agianst master: -12%
With upcoming change we'll improve decoding performance of enums, but there will be a difference between "sparse" and "dense" enum decoding performance even though they'll both be faster. To be able to measure the difference add a "sparse" enum type and a benchmark for decoding it. "Sparse" means the enum has large gaps between known enum values, or negative enum values. When decoding this kind of enums, the mapping from the wire `varint` to the Dart value for the enum needs to be done by binary search, map lookup, or similar. For "dense" enums, we can have a list of enum values and index the list directly with the `varint` value, after a range check. These changes will be done in the follow-up PR(s).
This was referenced May 12, 2025
Closed
Closing in favor or #981. |
osa1
added a commit
that referenced
this pull request
May 15, 2025
When decoding an enum value, we call a callback in the enum field's `FieldInfo`. The callback then indexes a map mapping enum numbers to Dart values. When these conditions hold: - The known enum numbers are all positive. (so that we can use a list and index it with the number) - The known enum numbers are more than 70% of the large known enum number. (so that the list won't have a lot of `null` entries, wasting heap space) We now generate a list instead of a map to map enum numbers to Dart values. Similar to the map, the list is runtime allocated. No new code generated per message or enum type. AOT benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 47585.14 us.` - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 38974.566666666666 us.` - Diff: -18% Wasm benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 52225.0 us.` - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 34283.33333333333 us.` - Diff: -34% **Alternatives considered:** - #980 uses a map always, but eliminates the `valueOf` closure. - #985 uses a list always, and does binary search in the list when the list is "shallow". - #987 is the same as #985, but instead of calling the `valueOf` closure it stores an extra field in `FieldInfo`s for whether to binary search or directly index. These are all slower than the current PR.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is the same as #981, except instead of map lookup we do binary search.
This is slightly faster than #981 on Wasm, and slightly slower than #981 on native.
Wasm results:
AOT results:
Note: in the benchmarks we have a small number (13) known values in the enum. I wonder if the benchmark results change in favor of one or the other with larger number of known values.