-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Percent-decode URLs in canonical comparisons #11088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Decode any percent-encoded characters in the path. | ||
| if let Ok(path) = urlencoding::decode(url.path()).map(|path| path.to_string()) { | ||
| url.set_path(&path); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this actually might be wrong. What if a slash is percent-encoded? We'd be changing it to a path segment.
I guess we could go the other way -- always percent-encode the URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's also not quite right... Because if the URL is already percent-encoded, we'd be re-encoding it. It's not idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we want to percent-decode each path segment, but not slashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm actually not totally sure how to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried figuring this out with https://url.spec.whatwg.org but it's still unclear how to handle file urls vs. https urls. If we're eager there is a complete decoding algorithm there though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the segment contains a percent-encoded slash, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, we could also consider just decoding +, if it's "special" (or something like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the url crate helps you here: https://docs.rs/url/latest/url/struct.PathSegmentsMut.html#method.extend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each segment is percent-encoded like in Url::parse or Url::join, except that % and / characters are also encoded (to %25 and %2F). This is unlike Url::parse where % is left as-is in case some of the input is already percent-encoded, and / denotes a path segment separator.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the spec it also seems that file:// and https:// have different rules, so we should test that our solution works for both
ed57db2 to
bdf74bc
Compare
b7dbfb9 to
a6e5ef0
Compare
a6e5ef0 to
75baf50
Compare
BurntSushi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.25` -> `0.5.27` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.5.27`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0527) [Compare Source](astral-sh/uv@0.5.26...0.5.27) ##### Enhancements - Avoid setting permissions during tar extraction ([#​11191](astral-sh/uv#11191)) - Remove warnings for missing lower bounds ([#​11195](astral-sh/uv#11195)) - Update PubGrub to set-based outdated priority tracking ([#​11169](astral-sh/uv#11169)) - Improve error messages for `uv pip install` with `--extra` or `--all-extras` and invalid sources ([#​11193](astral-sh/uv#11193)) - Sign Docker images using GitHub attestations ([#​8685](astral-sh/uv#8685)) ##### Preview features - Don't expand self-referential extras in the build backend ([#​11142](astral-sh/uv#11142)) ##### Performance - Filter discovered Python executables by source before querying ([#​11143](astral-sh/uv#11143)) - Optimize exclusion computation for markers ([#​11158](astral-sh/uv#11158)) - Use Astral-maintained `tokio-tar` fork ([#​11174](astral-sh/uv#11174)) - Remove unneeded `.clone()` ([#​11127](astral-sh/uv#11127)) ##### Bug fixes - Fix relative paths in bytecode compilation ([#​11177](astral-sh/uv#11177)) - Percent-decode URLs in canonical comparisons ([#​11088](astral-sh/uv#11088)) - Respect concurrency limits in parallel index fetch ([#​11182](astral-sh/uv#11182)) - Use wire JSON schema for conflict items ([#​11196](astral-sh/uv#11196)) - Use explicit `_GLibCVersion` tuple in uv-python crate ([#​11122](astral-sh/uv#11122)) ##### Documentation - Add Git SHA locking behavior to docs ([#​11125](astral-sh/uv#11125)) - Add best-practice flags to `pip install` example in troubleshooting guide ([#​11194](astral-sh/uv#11194)) - Set `VIRTUAL_ENV` in Jupyter kernels ([#​11155](astral-sh/uv#11155)) - Add instructions for deactivating an environment ([#​11200](astral-sh/uv#11200)) ### [`v0.5.26`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0526) [Compare Source](astral-sh/uv@0.5.25...0.5.26) ##### Enhancements - Add support for `uvx python` ([#​11076](astral-sh/uv#11076)) - Allow `--no-dev --invert` in `uv tree` ([#​11068](astral-sh/uv#11068)) - Update `uv python install --reinstall` to reinstall all previous versions ([#​11072](astral-sh/uv#11072)) - Consistently write log messages with capitalized first word ([#​11111](astral-sh/uv#11111)) - Suggest `--build-backend` when `--backend` is passed to `uv init` ([#​10958](astral-sh/uv#10958)) - Improve retry trace message ([#​11108](astral-sh/uv#11108)) ##### Performance - Remove unnecessary UTF-8 conversion in hash parsing ([#​11110](astral-sh/uv#11110)) ##### Bug fixes - Ignore non-hash fragments in HTML API responses ([#​11107](astral-sh/uv#11107)) - Avoid resolving symbolic links when querying Python interpreters ([#​11083](astral-sh/uv#11083)) - Avoid sharing state between universal and non-universal resolves ([#​11051](astral-sh/uv#11051)) - Error when `--script` is passing a non-PEP 723 script ([#​11118](astral-sh/uv#11118)) - Make metadata deserialization failures non-fatal in the cache ([#​11105](astral-sh/uv#11105)) - Mark metadata as dynamic when reading from built wheel cache ([#​11046](astral-sh/uv#11046)) - Propagate credentials for `<index>/simple` to `<index>/...` endpoints ([#​11074](astral-sh/uv#11074)) - Fix conflicting extra bug during `uv sync` ([#​11075](astral-sh/uv#11075)) ##### Documentation - Add PyTorch XPU instructions to the PyTorch guide ([#​11109](astral-sh/uv#11109)) - Add docs for signal handling ([#​11041](astral-sh/uv#11041)) - Explain build frontend vs. build backend ([#​11094](astral-sh/uv#11094)) - Fix formatting of `RUST_LOG` documentation ([#​10053](astral-sh/uv#10053)) - Fix typo in `--no-deps` description ([#​11073](astral-sh/uv#11073)) - Reflow CLI documentation comments ([#​11040](astral-sh/uv#11040)) - Shorten "Using existing Python versions" nav item so it fits on one line ([#​11077](astral-sh/uv#11077)) - Some minor touch-ups to the Python install guide ([#​11116](astral-sh/uv#11116)) - Update Dependabot tracking issue link ([#​11054](astral-sh/uv#11054)) - Update documentation for running in a container ([#​11052](astral-sh/uv#11052)) - Upgrade PyTorch version in documentation ([#​11114](astral-sh/uv#11114)) - Use `sys_platform` in lieu of `platform_system` in PyTorch docs ([#​11113](astral-sh/uv#11113)) - Use positive (rather than negative) markers in PyTorch examples ([#​11112](astral-sh/uv#11112)) - Fix unnecessary backslashes in brackets ([#​11059](astral-sh/uv#11059)) - Suggest setting copy link mode in GitLab integration guide ([#​11067](astral-sh/uv#11067)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xNDMuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE1Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Summary
This PR adds an additional normalization step to
CanonicalUrlwhereby we now percent-decode the path, to ensure that (e.g.)torch-2.5.1%2Bcpu.cxx11.abi-cp39-cp39-linux_x86_64.whlandtorch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whlare considered equal. Further, when generating the "reinstall" report, we use the canonical URL rather than the verbatim URL.In making this change, I also learned that we don't apply any of the normalization passes to
file://URLs. I inadvertently removed it in 93d606a, since setting the password or URL onfile://URL errors -- but now suppress those errors anyway.Closes #11082.
Test Plan
python3.9 -m pip install torch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whl --platform linux_x86_64 --target foo --no-depscargo run pip install torch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whl --python-platform linux --python-version 3.9 --target foo --no-deps~symbol for the reinstall.