Skip to content

fix(namer): escape, rather than strip, non-ASCII ident. characters #7995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

ErichDonGubler
Copy link
Member

@ErichDonGubler ErichDonGubler commented Jul 23, 2025

Escape non-ASCII identifier characters with write!(…, "u{:04x}", …), surrounding with _ as appropriate. This solves (1) a debugging issue where stripped characters would otherwise be invisible, and (2) failure to re-validate that stripped identifiers didn't start with an ASCII digit.

I've confirmed that this fixes bug 1978197 on the Firefox side.

Testing

Added a regression test.

Squash or Rebase?

squashplz

Checklist

  • If this contains user-facing changes, add a CHANGELOG.md entry.

@ErichDonGubler ErichDonGubler added type: bug Something isn't working naga Shader Translator area: naga processing Passes over IR in the middle labels Jul 23, 2025
@ErichDonGubler ErichDonGubler force-pushed the escape-utf-idents branch 2 times, most recently from e6b5270 to 0108909 Compare July 23, 2025 19:19
@ErichDonGubler ErichDonGubler marked this pull request as ready for review July 23, 2025 19:19
Copy link
Contributor

@andyleiserson andyleiserson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not generating illegal identifiers 😄

if !s.is_empty() && !had_underscore_at_end {
s.push('_');
}
write!(s, "u{:04x}_", c as u32).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important, since you've covered it with snapshot tests, but one thing I've noticed about naga/wgpu is that we don't have a lot of unit tests, and this behavior seems like a good candidate for unit testing. (But as I said, not important, I don't think it's worth going back and changing/adding the tests, this is more a reminder to be thinking about unit tests in general.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the scope; I also think it would be nice to use several snapshot-ish unit tests in follow-up work, just to make it easier to reason about some changes.

@@ -5,11 +5,11 @@ precision highp int;

layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

struct _atomic_compare_exchange_resultSint4_ {
struct _atomic_compare_exchange_result_u003c_Sint_u002c_4_u003e {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly unfortunate to use so many characters for an internally-generated type name. Maybe there could be a rule like "any number of consecutive :<>, characters are mapped to a single underscore"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like 3a4c931?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks great to me.

@ErichDonGubler
Copy link
Member Author

Just filed https://treeherder.mozilla.org/jobs?repo=try&landoCommitID=144433 to see if this breaks anything. AFAIK it's unusual for us to change stuff like this, though I'm not concerned about anything concrete.

Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really comfortable with the way this affects so many identifiers that don't contain non-ASCII characters. Naga does not promise to preserve identifier names at all; we could just name everything e1, e2. There's no benefit to us guaranteeing that the original identifier can be reconstructed from the identifiers we generate.

@jimblandy
Copy link
Member

jimblandy commented Jul 31, 2025

The only job of Namer::sanitize is to produce a valid identifier prefix quickly and simply. Preserving the original name is just for our convenience in debugging; there is no contract. We should not double the size of this code just to delete characters safely.

For example, a better fix might be to delete almost all of that function, and then say, if string works as-is, return it; otherwise, return "e". That's the kind of direction we want to be headed here, not growing our own identifier mangling syntax.

Escape non-ASCII identifier characters with `write!(…, "u{:04x}", …)`,
surrounding with `_` as appropriate. This solves (1) a debugging issue
where stripped characters would otherwise be invisible, and (2) failure
to re-validate that stripped identifiers didn't start with an ASCII
digit.

I've confirmed that this fixes [bug
1978197](https://bugzilla.mozilla.org/show_bug.cgi?id=1978197) on the
Firefox side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: naga processing Passes over IR in the middle naga Shader Translator type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants