Skip to content

Conversation

@buyaa-n
Copy link
Contributor

@buyaa-n buyaa-n commented Jul 22, 2024

When the input is not multiple of 3 some bits of the encoded values are not used, Base64 encoder sets those bits to 0, but decoder currently doesn't check those bits and allows any combination of values. Therefore multiple input could decoded to same value, for example when the input is 1 byte character 'A', encoder encodes it to 2 base64 characters and 2 padding "QQ==" , the last 4 bits of the 2nd Q is not used and set to 0s, but decoder doesn't validate that and allows 2^4 = 16 values decoded into same value, for example: "QQ==", "QR==", "QS==", "QT==", "QV==", "QU==", "QW==", "QX==", "QY==", "QZ==", "Qa==", "Qb==", "Qc==", "Qd==", "Qe==", "Qf" will be decoded to a same value, 65, ascii of 'A'.

The spec mentions that unused bits MUST be set to zero by conforming encoders. It also mentions that decoders may reject an input if pad bits have not been set to zero. We don't see any reason to keep allowing non-zero value for those other combinations that produce same result when encoders are expected to produce only one value.

This doesn't seem to be a breaking change, my quick research did not find any encoder, that produces output that doesn't set unused bits to 0. Though it could break tests that randomly generates Base64 encoded text.

Further we should fix this for Convert.FromBase64XYZ(...) overloads

Related to #50233 (comment)

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

@buyaa-n buyaa-n added this to the 9.0.0 milestone Jul 26, 2024
@buyaa-n buyaa-n merged commit eb765b7 into dotnet:main Jul 26, 2024
@buyaa-n buyaa-n deleted the reject-non-zero branch July 26, 2024 17:51
@buyaa-n buyaa-n mentioned this pull request Jul 26, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 27, 2024
stephentoub added a commit that referenced this pull request Oct 24, 2025
…e not set to 0 (#121044)

Implements RFC 4648 Section 3.5 compliance by rejecting Base64 input
where unused bits are not set to zero. This ensures that decoding is
deterministic—only one valid encoding exists for each byte sequence.

Fixes #105262
---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: stephentoub <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants