Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions crates/uv-cache-key/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@ workspace = true

[dependencies]
hex = { workspace = true }
memchr = { workspace = true }
seahash = { workspace = true }
url = { workspace = true }
urlencoding = { workspace = true }
55 changes: 50 additions & 5 deletions crates/uv-cache-key/src/canonical_url.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use std::borrow::Cow;
use std::fmt::{Debug, Formatter};
use std::hash::{Hash, Hasher};
use std::ops::Deref;
Expand Down Expand Up @@ -26,11 +27,6 @@ impl CanonicalUrl {
return Self(url);
}

// If the URL has no host, then it's not a valid URL anyway.
if !url.has_host() {
return Self(url);
}

// Strip credentials.
let _ = url.set_password(None);
let _ = url.set_username("");
Expand Down Expand Up @@ -76,6 +72,23 @@ impl CanonicalUrl {
}
}

// Decode any percent-encoded characters in the path.
if memchr::memchr(b'%', url.path().as_bytes()).is_some() {
let decoded = url
.path_segments()
.unwrap()
.map(|segment| {
urlencoding::decode(segment)
.unwrap_or(Cow::Borrowed(segment))
.into_owned()
})
.collect::<Vec<_>>();

let mut path_segments = url.path_segments_mut().unwrap();
path_segments.clear();
path_segments.extend(decoded);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually might be wrong. What if a slash is percent-encoded? We'd be changing it to a path segment.

I guess we could go the other way -- always percent-encode the URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's also not quite right... Because if the URL is already percent-encoded, we'd be re-encoding it. It's not idempotent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to percent-decode each path segment, but not slashes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm actually not totally sure how to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried figuring this out with https://url.spec.whatwg.org but it's still unclear how to handle file urls vs. https urls. If we're eager there is a complete decoding algorithm there though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the segment contains a percent-encoded slash, though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we could also consider just decoding +, if it's "special" (or something like that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each segment is percent-encoded like in Url::parse or Url::join, except that % and / characters are also encoded (to %25 and %2F). This is unlike Url::parse where % is left as-is in case some of the input is already percent-encoded, and / denotes a path segment separator.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the spec it also seems that file:// and https:// have different rules, so we should test that our solution works for both


Self(url)
}

Expand Down Expand Up @@ -276,6 +289,38 @@ mod tests {
CanonicalUrl::parse("git+https:://github.com/pypa/sample-namespace-packages.git")?,
);

// Two URLs should _not_ be considered equal based on percent-decoding slashes.
assert_ne!(
CanonicalUrl::parse("https://github.com/pypa/sample%2Fnamespace%2Fpackages")?,
CanonicalUrl::parse("https://github.com/pypa/sample/namespace/packages")?,
);

// Two URLs should be considered equal regardless of percent-encoding.
assert_eq!(
CanonicalUrl::parse("https://github.com/pypa/sample%2Bnamespace%2Bpackages")?,
CanonicalUrl::parse("https://github.com/pypa/sample+namespace+packages")?,
);

// Two URLs should _not_ be considered equal based on percent-decoding slashes.
assert_ne!(
CanonicalUrl::parse(
"file:///home/ferris/my_project%2Fmy_project-0.1.0-py3-none-any.whl"
)?,
CanonicalUrl::parse(
"file:///home/ferris/my_project/my_project-0.1.0-py3-none-any.whl"
)?,
);

// Two URLs should be considered equal regardless of percent-encoding.
assert_eq!(
CanonicalUrl::parse(
"file:///home/ferris/my_project/my_project-0.1.0+foo-py3-none-any.whl"
)?,
CanonicalUrl::parse(
"file:///home/ferris/my_project/my_project-0.1.0%2Bfoo-py3-none-any.whl"
)?,
);

Ok(())
}

Expand Down
54 changes: 43 additions & 11 deletions crates/uv-distribution-types/src/any.rs
Original file line number Diff line number Diff line change
@@ -1,60 +1,92 @@
use std::hash::Hash;

use uv_cache_key::CanonicalUrl;
use uv_normalize::PackageName;
use uv_pep440::Version;

use crate::cached::CachedDist;
use crate::installed::InstalledDist;
use crate::{InstalledMetadata, InstalledVersion, Name};

/// A distribution which is either installable, is a wheel in our cache or is already installed.
///
/// Note equality and hash operations are only based on the name and version, not the kind.
/// Note equality and hash operations are only based on the name and canonical version, not the
/// kind.
#[derive(Debug, Clone, Eq)]
#[allow(clippy::large_enum_variant)]
pub enum LocalDist {
Cached(CachedDist),
Installed(InstalledDist),
Cached(CachedDist, CanonicalVersion),
Installed(InstalledDist, CanonicalVersion),
}

impl LocalDist {
fn canonical_version(&self) -> &CanonicalVersion {
match self {
Self::Cached(_, version) => version,
Self::Installed(_, version) => version,
}
}
}

impl Name for LocalDist {
fn name(&self) -> &PackageName {
match self {
Self::Cached(dist) => dist.name(),
Self::Installed(dist) => dist.name(),
Self::Cached(dist, _) => dist.name(),
Self::Installed(dist, _) => dist.name(),
}
}
}

impl InstalledMetadata for LocalDist {
fn installed_version(&self) -> InstalledVersion {
match self {
Self::Cached(dist) => dist.installed_version(),
Self::Installed(dist) => dist.installed_version(),
Self::Cached(dist, _) => dist.installed_version(),
Self::Installed(dist, _) => dist.installed_version(),
}
}
}

impl From<CachedDist> for LocalDist {
fn from(dist: CachedDist) -> Self {
Self::Cached(dist)
let version = CanonicalVersion::from(dist.installed_version());
Self::Cached(dist, version)
}
}

impl From<InstalledDist> for LocalDist {
fn from(dist: InstalledDist) -> Self {
Self::Installed(dist)
let version = CanonicalVersion::from(dist.installed_version());
Self::Installed(dist, version)
}
}

impl Hash for LocalDist {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.name().hash(state);
self.installed_version().hash(state);
self.canonical_version().hash(state);
}
}

impl PartialEq for LocalDist {
fn eq(&self, other: &Self) -> bool {
self.name() == other.name() && self.installed_version() == other.installed_version()
self.name() == other.name() && self.canonical_version() == other.canonical_version()
}
}

/// Like [`InstalledVersion`], but with [`CanonicalUrl`] to ensure robust URL comparisons.
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum CanonicalVersion {
Version(Version),
Url(CanonicalUrl, Version),
}

impl From<InstalledVersion<'_>> for CanonicalVersion {
fn from(installed_version: InstalledVersion<'_>) -> Self {
match installed_version {
InstalledVersion::Version(version) => Self::Version(version.clone()),
InstalledVersion::Url(url, version) => {
Self::Url(CanonicalUrl::new(url), version.clone())
}
}
}
}
Loading