-
Notifications
You must be signed in to change notification settings - Fork 13.9k
AST: Give spans to all identifiers #49154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
r? @pnkfelix (rust_highfive has picked a reviewer for you, use r? to override) |
|
cc @jseyfried @nrc @rust-lang/compiler |
|
👍 |
|
This looks great! cc @michaelwoerister on potential interactions with incremental hashing. |
|
Epic PR |
|
Somewhat odd failure on Travis: |
|
/subscribe: follow up after landing by removing as many |
Lifetimes are also represented with |
src/librustc/ich/impls_hir.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelwoerister
Now I'm pretty sure this is a mistake (and ignoring ctxt was a mistake as well?).
Spans are hashed in other positions, including the ctxt part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be updated! Good catch.
src/libsyntax_pos/symbol.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these Hash and PartialEq I've kept the old behavior - compare and hash idents as (Name, SyntaxContext).
PartialEq for Ident should either
- not exist (I'd prefer this, but larger structures containing idents want to derive
PartialEqandHashfor whatever reasons), or - behave like
PartialEqfor(Name, SyntaxContext).PartialEqnot behaving like this is a notable footgun and source of bugs, we had it couple years ago. Well, andHashshould has to be implemented consistently withPartialEq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to derive
PartialEqandHashfor whatever reasons
What happens if you just remove those? Is PartialEq it used only for tests? Could those tests serialize to JSON, use pattern matching, or do something else instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try.
IIRC, at least token::Token contains Ident and uses == all the time for "simple" variants not actually containing Idents or other data (the case that would be perfectly supported by is, btw).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a bit of a bummer - and yeah, I agree about is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, should Token contain Ident, or just Symbol (and keep the Span separate)? Or is that too much hassle / not readily supported by existing infrastructure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lots of spans get moved from the regular HIR nodes to Ident and then we don't hash them, that would not be good. But the HashStable implementation for Ident doesn't ignore the span, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, StableHash doesn't ignore the span (well, after fixing #49154 (comment)).
Also,
regular HIR nodes
this PR doesn't touch HIR yet, only AST.
|
@petrochenkov very cool |
|
☔ The latest upstream changes (presumably #49308) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Updated (there are couple of new commits). |
|
☔ The latest upstream changes (presumably #49101) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Ping from triage, @eddyb ! Will you have time to review this soon? FYI, @petrochenkov — you've got some merge conflicts. |
|
|
src/librustc/ich/impls_hir.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @michaelwoerister (making sure you see this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mw previously confirmed the span should be hashed here - #49154 (comment)
src/libsyntax/feature_gate.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's happening here. Do you have an example where the behavior changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fix for the regression caught by Travis and mentioned in #49154 (comment).
The problem is that here we want two different SyntaxContexts for identifiers generated by #[test] - one context for name resolution (unhygienic) and another one for stability checking (hygienic, stability is not checked), but Ident has only one SyntaxContext.
So we can keep the stability checking context in something that is not used for name resolution, for example whole Path (or anything else that is not an identifier/lifetime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Can the source comment be expanded to say more of what is happening from its own perspective? e.g. starting with "Here we check the span from the whole Path instead of that of individual Idents specifically because only the former can be ...".
|
Updated. |
AST: Give spans to all identifiers
Change representation of `ast::Ident` from `{ name: Symbol, ctxt: SyntaxContext }` to `{ name: Symbol, span: Span }`.
Syntax contexts still can be extracted from spans (`span.ctxt()`).
Why this should not require more memory:
- `Span` is `u32` just like `SyntaxContext`.
- Despite keeping more spans in AST we don't actually *create* more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.
Why this may be slightly slower:
- When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.
cc #48842 (comment)
|
☀️ Test successful - status-appveyor, status-travis |
|
Is it possible that this PR regressed expansion info on derives? The code generated by #[derive(Debug)]
pub enum Error {
Type(
&'static str,
),
}contains a |
|
@oli-obk In this specific example though, it looks like const __self_0: u8 = 0;
#[derive(Debug)]
pub enum Error {
Type(
&'static str,
),
}
fn main() {}
---- On stable
error[E0530]: match bindings cannot shadow constants
--> src/main.rs:6:9
|
1 | const __self_0: u8 = 0;
| ----------------------- a constant `__self_0` is defined here
...
6 | &'static str,
| ^^^^^^^^^^^^^ cannot be named the same as a constantOr you are talking about EDIT: I see, it's about of whole |
|
@petrochenkov the current reproduction is a clippy lint reporting things that happened inside the expansion, but reporting it at the span of the type. The relevant lint is https://github.com/rust-lang-nursery/rust-clippy/blob/master/clippy_lints/src/needless_borrow.rs#L86 We do check for macro expansions here, which uses the |
|
We took the span from a |
Discovered in rust-lang#50061 we're falling off the "happy path" of using a stringified token stream more often than we should. This was due to the fact that a user-written token like `0xf` is equality-different from the stringified token of `15` (despite being semantically equivalent). This patch updates the call to `eq_unspanned` with an even more awful solution, `probably_equal_for_proc_macro`, which ignores the value of each token and basically only compares the structure of the token stream, assuming that the AST doesn't change just one token at a time. While this is a step towards fixing rust-lang#50061 there is still one regression from rust-lang#49154 which needs to be fixed.
proc_macro: Stay on the "use the cache" path more Discovered in #50061 we're falling off the "happy path" of using a stringified token stream more often than we should. This was due to the fact that a user-written token like `0xf` is equality-different from the stringified token of `15` (despite being semantically equivalent). This patch updates the call to `eq_unspanned` with an even more awful solution, `probably_equal_for_proc_macro`, which ignores the value of each token and basically only compares the structure of the token stream, assuming that the AST doesn't change just one token at a time. While this is a step towards fixing #50061 there is still one regression from #49154 which needs to be fixed.
Change representation of
ast::Identfrom{ name: Symbol, ctxt: SyntaxContext }to{ name: Symbol, span: Span }.Syntax contexts still can be extracted from spans (
span.ctxt()).Why this should not require more memory:
Spanisu32just likeSyntaxContext.Why this may be slightly slower:
cc #48842 (comment)