Skip to content

Conversation

@charliermarsh
Copy link
Member

Summary

This PR adds an additional normalization step to CanonicalUrl whereby we now percent-decode the path, to ensure that (e.g.) torch-2.5.1%2Bcpu.cxx11.abi-cp39-cp39-linux_x86_64.whl and torch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whl are considered equal. Further, when generating the "reinstall" report, we use the canonical URL rather than the verbatim URL.

In making this change, I also learned that we don't apply any of the normalization passes to file:// URLs. I inadvertently removed it in 93d606a, since setting the password or URL on file:// URL errors -- but now suppress those errors anyway.

Closes #11082.

Test Plan

  • Downloaded a PyTorch wheel
  • python3.9 -m pip install torch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whl --platform linux_x86_64 --target foo --no-deps
  • cargo run pip install torch-2.5.1+cpu.cxx11.abi-cp39-cp39-linux_x86_64.whl --python-platform linux --python-version 3.9 --target foo --no-deps
  • Verified that the package had the ~ symbol for the reinstall.

// Decode any percent-encoded characters in the path.
if let Ok(path) = urlencoding::decode(url.path()).map(|path| path.to_string()) {
url.set_path(&path);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually might be wrong. What if a slash is percent-encoded? We'd be changing it to a path segment.

I guess we could go the other way -- always percent-encode the URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's also not quite right... Because if the URL is already percent-encoded, we'd be re-encoding it. It's not idempotent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to percent-decode each path segment, but not slashes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm actually not totally sure how to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried figuring this out with https://url.spec.whatwg.org but it's still unclear how to handle file urls vs. https urls. If we're eager there is a complete decoding algorithm there though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the segment contains a percent-encoded slash, though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we could also consider just decoding +, if it's "special" (or something like that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each segment is percent-encoded like in Url::parse or Url::join, except that % and / characters are also encoded (to %25 and %2F). This is unlike Url::parse where % is left as-is in case some of the input is already percent-encoded, and / denotes a path segment separator.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the spec it also seems that file:// and https:// have different rules, so we should test that our solution works for both

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@charliermarsh charliermarsh merged commit 26f84e5 into main Jan 31, 2025
91 checks passed
@charliermarsh charliermarsh deleted the charlie/percent branch January 31, 2025 20:45
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Feb 4, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.25` -> `0.5.27` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.5.27`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0527)

[Compare Source](astral-sh/uv@0.5.26...0.5.27)

##### Enhancements

-   Avoid setting permissions during tar extraction ([#&#8203;11191](astral-sh/uv#11191))
-   Remove warnings for missing lower bounds ([#&#8203;11195](astral-sh/uv#11195))
-   Update PubGrub to set-based outdated priority tracking ([#&#8203;11169](astral-sh/uv#11169))
-   Improve error messages for `uv pip install` with `--extra` or `--all-extras` and invalid sources ([#&#8203;11193](astral-sh/uv#11193))
-   Sign Docker images using GitHub attestations ([#&#8203;8685](astral-sh/uv#8685))

##### Preview features

-   Don't expand self-referential extras in the build backend ([#&#8203;11142](astral-sh/uv#11142))

##### Performance

-   Filter discovered Python executables by source before querying ([#&#8203;11143](astral-sh/uv#11143))
-   Optimize exclusion computation for markers ([#&#8203;11158](astral-sh/uv#11158))
-   Use Astral-maintained `tokio-tar` fork ([#&#8203;11174](astral-sh/uv#11174))
-   Remove unneeded `.clone()` ([#&#8203;11127](astral-sh/uv#11127))

##### Bug fixes

-   Fix relative paths in bytecode compilation ([#&#8203;11177](astral-sh/uv#11177))
-   Percent-decode URLs in canonical comparisons ([#&#8203;11088](astral-sh/uv#11088))
-   Respect concurrency limits in parallel index fetch ([#&#8203;11182](astral-sh/uv#11182))
-   Use wire JSON schema for conflict items ([#&#8203;11196](astral-sh/uv#11196))
-   Use explicit `_GLibCVersion` tuple in uv-python crate ([#&#8203;11122](astral-sh/uv#11122))

##### Documentation

-   Add Git SHA locking behavior to docs ([#&#8203;11125](astral-sh/uv#11125))
-   Add best-practice flags to `pip install` example in troubleshooting guide ([#&#8203;11194](astral-sh/uv#11194))
-   Set `VIRTUAL_ENV` in Jupyter kernels ([#&#8203;11155](astral-sh/uv#11155))
-   Add instructions for deactivating an environment ([#&#8203;11200](astral-sh/uv#11200))

### [`v0.5.26`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0526)

[Compare Source](astral-sh/uv@0.5.25...0.5.26)

##### Enhancements

-   Add support for `uvx python` ([#&#8203;11076](astral-sh/uv#11076))
-   Allow `--no-dev --invert` in `uv tree` ([#&#8203;11068](astral-sh/uv#11068))
-   Update `uv python install --reinstall` to reinstall all previous versions ([#&#8203;11072](astral-sh/uv#11072))
-   Consistently write log messages with capitalized first word ([#&#8203;11111](astral-sh/uv#11111))
-   Suggest `--build-backend` when `--backend` is passed to `uv init` ([#&#8203;10958](astral-sh/uv#10958))
-   Improve retry trace message ([#&#8203;11108](astral-sh/uv#11108))

##### Performance

-   Remove unnecessary UTF-8 conversion in hash parsing ([#&#8203;11110](astral-sh/uv#11110))

##### Bug fixes

-   Ignore non-hash fragments in HTML API responses ([#&#8203;11107](astral-sh/uv#11107))
-   Avoid resolving symbolic links when querying Python interpreters ([#&#8203;11083](astral-sh/uv#11083))
-   Avoid sharing state between universal and non-universal resolves ([#&#8203;11051](astral-sh/uv#11051))
-   Error when `--script` is passing a non-PEP 723 script ([#&#8203;11118](astral-sh/uv#11118))
-   Make metadata deserialization failures non-fatal in the cache ([#&#8203;11105](astral-sh/uv#11105))
-   Mark metadata as dynamic when reading from built wheel cache ([#&#8203;11046](astral-sh/uv#11046))
-   Propagate credentials for `<index>/simple` to `<index>/...` endpoints ([#&#8203;11074](astral-sh/uv#11074))
-   Fix conflicting extra bug during `uv sync` ([#&#8203;11075](astral-sh/uv#11075))

##### Documentation

-   Add PyTorch XPU instructions to the PyTorch guide ([#&#8203;11109](astral-sh/uv#11109))
-   Add docs for signal handling ([#&#8203;11041](astral-sh/uv#11041))
-   Explain build frontend vs. build backend ([#&#8203;11094](astral-sh/uv#11094))
-   Fix formatting of `RUST_LOG` documentation ([#&#8203;10053](astral-sh/uv#10053))
-   Fix typo in `--no-deps` description ([#&#8203;11073](astral-sh/uv#11073))
-   Reflow CLI documentation comments ([#&#8203;11040](astral-sh/uv#11040))
-   Shorten "Using existing Python versions" nav item so it fits on one line ([#&#8203;11077](astral-sh/uv#11077))
-   Some minor touch-ups to the Python install guide ([#&#8203;11116](astral-sh/uv#11116))
-   Update Dependabot tracking issue link ([#&#8203;11054](astral-sh/uv#11054))
-   Update documentation for running in a container ([#&#8203;11052](astral-sh/uv#11052))
-   Upgrade PyTorch version in documentation ([#&#8203;11114](astral-sh/uv#11114))
-   Use `sys_platform` in lieu of `platform_system` in PyTorch docs ([#&#8203;11113](astral-sh/uv#11113))
-   Use positive (rather than negative) markers in PyTorch examples ([#&#8203;11112](astral-sh/uv#11112))
-   Fix unnecessary backslashes in brackets ([#&#8203;11059](astral-sh/uv#11059))
-   Suggest setting copy link mode in GitLab integration guide ([#&#8203;11067](astral-sh/uv#11067))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xNDMuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE1Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uv encodes direct_url.json content differently than pip

4 participants