AST: Give spans to all identifiers #49154

petrochenkov · 2018-03-19T00:59:47Z

Change representation of ast::Ident from { name: Symbol, ctxt: SyntaxContext } to { name: Symbol, span: Span }.
Syntax contexts still can be extracted from spans (span.ctxt()).

Why this should not require more memory:

Span is u32 just like SyntaxContext.
Despite keeping more spans in AST we don't actually create more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.

Why this may be slightly slower:

When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.

cc #48842 (comment)

rust-highfive · 2018-03-19T00:59:58Z

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

petrochenkov · 2018-03-19T01:00:17Z

cc @jseyfried @nrc @rust-lang/compiler
r? @eddyb

nrc · 2018-03-19T02:01:37Z

👍

eddyb · 2018-03-19T14:37:45Z

This looks great! cc @michaelwoerister on potential interactions with incremental hashing.
r=me after Travis CI build is fixed and mw had a look at it.

michaelwoerister · 2018-03-19T15:21:42Z

Epic PR :) I think this should be fine for incr. comp., we already hash syntax contexts and spans.

Mark-Simulacrum · 2018-03-19T18:33:49Z

Somewhat odd failure on Travis:

[00:50:03] ---- [run-pass] run-pass/rfc-2126-extern-absolute-paths/test.rs stdout ----
[00:50:03] 	
[00:50:03] error: compilation failed!
[00:50:03] status: exit code: 101
[00:50:03] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/run-pass/rfc-2126-extern-absolute-paths/test.rs" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass" "--target=x86_64-unknown-linux-gnu" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/rfc-2126-extern-absolute-paths/test.stage2-x86_64-unknown-linux-gnu" "-Crpath" "-O" "-Zmiri" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--test" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/rfc-2126-extern-absolute-paths/test.stage2-x86_64-unknown-linux-gnu.aux"
[00:50:03] stdout:
[00:50:03] ------------------------------------------
[00:50:03] 
[00:50:03] ------------------------------------------
[00:50:03] stderr:
[00:50:03] ------------------------------------------
[00:50:03] error[E0658]: `crate` in paths is experimental (see issue #45477)
[00:50:03]   --> /checkout/src/test/run-pass/rfc-2126-extern-absolute-paths/test.rs:20:1
[00:50:03]    |
[00:50:03] 20 | / fn test() {
[00:50:03] 21 | | }
[00:50:03]    | |_^
[00:50:03]    |
[00:50:03]    = help: add #![feature(crate_in_paths)] to the crate attributes to enable

estebank · 2018-03-19T18:36:44Z

/subscribe: follow up after landing by removing as many codemap().def_span() calls throughout the codebase as possible. (I don't think it will be possible to remove all, as some are spans for lifetimes, but the ones for struct/trait/impls errors should use the ident spans.)

petrochenkov · 2018-03-19T19:13:47Z

@estebank

some are spans for lifetimes

Lifetimes are also represented with Ident so they get spans as well.
Also, this is only the first half of "giving spans to all identifiers" - I haven't converted HIR to use Ident yet, this is going to be a separate large PR.

petrochenkov · 2018-03-19T19:15:29Z

src/librustc/ich/impls_hir.rs

@@ -653,7 +653,7 @@ impl<'a> HashStable<StableHashingContext<'a>> for ast::Ident {
                                          hasher: &mut StableHasher<W>) {
        let ast::Ident {
            ref name,
-            ctxt: _ // Ignore this
+            span: _ // Ignore this


@michaelwoerister
Now I'm pretty sure this is a mistake (and ignoring ctxt was a mistake as well?).
Spans are hashed in other positions, including the ctxt part.

Yes, that should be updated! Good catch.

petrochenkov · 2018-03-19T19:21:39Z

src/libsyntax_pos/symbol.rs

+impl Hash for Ident {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.name.hash(state);
+        self.span.ctxt().hash(state);


For these Hash and PartialEq I've kept the old behavior - compare and hash idents as (Name, SyntaxContext).

PartialEq for Ident should either

not exist (I'd prefer this, but larger structures containing idents want to derive PartialEq and Hash for whatever reasons), or

behave like PartialEq for (Name, SyntaxContext). PartialEq not behaving like this is a notable footgun and source of bugs, we had it couple years ago. Well, and Hash should has to be implemented consistently with PartialEq.

want to derive PartialEq and Hash for whatever reasons

What happens if you just remove those? Is PartialEq it used only for tests? Could those tests serialize to JSON, use pattern matching, or do something else instead?

I'll try.
IIRC, at least token::Token contains Ident and uses == all the time for "simple" variants not actually containing Idents or other data (the case that would be perfectly supported by is, btw).

Oh, that's a bit of a bummer - and yeah, I agree about is.

On second thought, should Token contain Ident, or just Symbol (and keep the Span separate)? Or is that too much hassle / not readily supported by existing infrastructure?

If lots of spans get moved from the regular HIR nodes to Ident and then we don't hash them, that would not be good. But the HashStable implementation for Ident doesn't ignore the span, right?

Yes, StableHash doesn't ignore the span (well, after fixing #49154 (comment)).

Also,

regular HIR nodes

this PR doesn't touch HIR yet, only AST.

nikomatsakis · 2018-03-19T20:38:44Z

@petrochenkov very cool

bors · 2018-03-23T21:41:43Z

☔ The latest upstream changes (presumably #49308) made this pull request unmergeable. Please resolve the merge conflicts.

petrochenkov · 2018-03-25T16:02:02Z

Updated (there are couple of new commits).

bors · 2018-03-26T21:21:36Z

☔ The latest upstream changes (presumably #49101) made this pull request unmergeable. Please resolve the merge conflicts.

shepmaster · 2018-03-30T19:39:02Z

Ping from triage, @eddyb ! Will you have time to review this soon?

FYI, @petrochenkov — you've got some merge conflicts.

eddyb · 2018-03-30T22:57:17Z

~~@shepmaster I already reviewed in #49154 (comment).~~ I see, #49154 (comment)

eddyb · 2018-03-30T22:58:39Z

src/librustc/ich/impls_hir.rs

        } = *self;

        name.hash_stable(hcx, hasher);
+        span.hash_stable(hcx, hasher);


cc @michaelwoerister (making sure you see this)

mw previously confirmed the span should be hashed here - #49154 (comment)

eddyb · 2018-03-30T23:15:43Z

src/libsyntax/feature_gate.rs

@@ -1774,11 +1774,14 @@ impl<'a> Visitor<'a> for PostExpansionVisitor<'a> {

    fn visit_path(&mut self, path: &'a ast::Path, _id: NodeId) {
        for segment in &path.segments {
+            // Context of ident spans cannot be overriden to ignore unstable features,
+            // so replace it with path span context.


I'm not sure what's happening here. Do you have an example where the behavior changes?

It's a fix for the regression caught by Travis and mentioned in #49154 (comment).

The problem is that here we want two different SyntaxContexts for identifiers generated by #[test] - one context for name resolution (unhygienic) and another one for stability checking (hygienic, stability is not checked), but Ident has only one SyntaxContext.

So we can keep the stability checking context in something that is not used for name resolution, for example whole Path (or anything else that is not an identifier/lifetime).

Oh I see. Can the source comment be expanded to say more of what is happening from its own perspective? e.g. starting with "Here we check the span from the whole Path instead of that of individual Idents specifically because only the former can be ...".

petrochenkov · 2018-04-03T00:39:30Z

Updated.

AST: Give spans to all identifiers Change representation of `ast::Ident` from `{ name: Symbol, ctxt: SyntaxContext }` to `{ name: Symbol, span: Span }`. Syntax contexts still can be extracted from spans (`span.ctxt()`). Why this should not require more memory: - `Span` is `u32` just like `SyntaxContext`. - Despite keeping more spans in AST we don't actually *create* more spans, so the number of "outlined" spans kept in span interner shouldn't become larger. Why this may be slightly slower: - When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise. cc #48842 (comment)

bors · 2018-04-06T12:03:11Z

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing a143462 to master...

oli-obk · 2018-04-07T09:01:19Z

Is it possible that this PR regressed expansion info on derives?

The code generated by derive(Debug) in the following

#[derive(Debug)]
pub enum Error {
    Type(
        &'static str,
    ),
}

contains a ref __self_0 pattern whose span is the unchanged span of &'static str above.

petrochenkov · 2018-04-07T11:18:30Z

@oli-obk
Sure, this is a relatively large refactoring, I tried to be careful, but something could break accidentally.

In this specific example though, it looks like __self_0 always had span of &'static str:

const __self_0: u8 = 0;

#[derive(Debug)]
pub enum Error {
    Type(
        &'static str,
    ),
}

fn main() {}

---- On stable

error[E0530]: match bindings cannot shadow constants
 --> src/main.rs:6:9
  |
1 | const __self_0: u8 = 0;
  | ----------------------- a constant `__self_0` is defined here
...
6 |         &'static str,
  |         ^^^^^^^^^^^^^ cannot be named the same as a constant

Or you are talking about SyntaxContext? Could you give a reproduction then?

EDIT: I see, it's about of whole ref __self_0 pattern span. Reproduction would still be appreciated then.

oli-obk · 2018-04-07T11:24:57Z

@petrochenkov the current reproduction is a clippy lint reporting things that happened inside the expansion, but reporting it at the span of the type.

The relevant lint is https://github.com/rust-lang-nursery/rust-clippy/blob/master/clippy_lints/src/needless_borrow.rs#L86

We do check for macro expansions here, which uses the expn_info to figure out whether any expansion has happened (implemented here)

petrochenkov · 2018-04-07T11:30:25Z

@oli-obk
Before I look into that Clippy code, you may be interested in the last commit - 1458684.
Spans of identifiers themselves cannot be used for detecting macro expansions anymore, you have to use spans from some larger context. This may affect Clippy as well.

oli-obk · 2018-04-07T11:35:10Z

We took the span from a Pat, not an Ident, but that Pat might've gotten its span from an identifier. I'll have a look at the expansion code of Debug derives.

Discovered in rust-lang#50061 we're falling off the "happy path" of using a stringified token stream more often than we should. This was due to the fact that a user-written token like `0xf` is equality-different from the stringified token of `15` (despite being semantically equivalent). This patch updates the call to `eq_unspanned` with an even more awful solution, `probably_equal_for_proc_macro`, which ignores the value of each token and basically only compares the structure of the token stream, assuming that the AST doesn't change just one token at a time. While this is a step towards fixing rust-lang#50061 there is still one regression from rust-lang#49154 which needs to be fixed.

proc_macro: Stay on the "use the cache" path more Discovered in #50061 we're falling off the "happy path" of using a stringified token stream more often than we should. This was due to the fact that a user-written token like `0xf` is equality-different from the stringified token of `15` (despite being semantically equivalent). This patch updates the call to `eq_unspanned` with an even more awful solution, `probably_equal_for_proc_macro`, which ignores the value of each token and basically only compares the structure of the token stream, assuming that the AST doesn't change just one token at a time. While this is a step towards fixing #50061 there is still one regression from #49154 which needs to be fixed.

rust-highfive assigned pnkfelix Mar 19, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 19, 2018

rust-highfive assigned eddyb and unassigned pnkfelix Mar 19, 2018

petrochenkov commented Mar 19, 2018

View reviewed changes

This was referenced Mar 22, 2018

Use InternedString instead of Symbol for type parameter types #49266

Closed

syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300

Closed

petrochenkov mentioned this pull request Mar 24, 2018

macros: Remove matching on "complex" nonterminals requiring AST comparisons #49326

Merged

petrochenkov force-pushed the spident branch from 7db4610 to 3162ab0 Compare March 25, 2018 15:51

petrochenkov changed the title ~~[WIP] AST: Give spans to all identifiers~~ AST: Give spans to all identifiers Mar 25, 2018

eddyb reviewed Mar 30, 2018

View reviewed changes

oli-obk mentioned this pull request Mar 31, 2018

WIP: Make too-many-arguments errors span fn header only rust-lang/rust-clippy#2599

Closed

petrochenkov force-pushed the spident branch from 3162ab0 to 6055bc8 Compare April 3, 2018 00:39

bors merged commit 1458684 into rust-lang:master Apr 6, 2018

topecongiro mentioned this pull request Apr 6, 2018

Cargo update rust-lang/rustfmt#2602

Merged

ghost mentioned this pull request Apr 7, 2018

Fix compilation for nightly 2018-04-06 rust-lang/rust-clippy#2640

Merged

topecongiro mentioned this pull request Apr 7, 2018

Update rustc-ap-syntax rust-lang/rustfmt#2604

Closed

kennytm mentioned this pull request Apr 8, 2018

"died due to signal 11" in atomic::static_init libcore test on Android #49775

Closed

petrochenkov mentioned this pull request Apr 18, 2018

Hygiene opt-out for idents in expansion of declarative macros #47992

Closed

This was referenced Apr 19, 2018

Hygiene break in macros involving string containing single quote #50061

Closed

v0.1.1 has a token hygeiene bug. alexcrichton/futures-await#95

Closed

alexcrichton mentioned this pull request Apr 19, 2018

proc_macro: Stay on the "use the cache" path more #50069

Merged

phansch mentioned this pull request Apr 26, 2018

False positive on similar_names? rust-lang/rust-clippy#2651

Closed

petrochenkov mentioned this pull request May 1, 2018

TokenStream::parse does not resolve with Span::call_site() #50050

Closed

phansch mentioned this pull request May 10, 2018

warn(needless_borrow) creates false positive when using derive(Debug) rust-lang/rust-clippy#2740

Closed

petrochenkov deleted the spident branch June 5, 2019 16:05

AST: Give spans to all identifiers #49154

AST: Give spans to all identifiers #49154

Uh oh!

Conversation

petrochenkov commented Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Mar 19, 2018

Uh oh!

petrochenkov commented Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrc commented Mar 19, 2018

Uh oh!

eddyb commented Mar 19, 2018

Uh oh!

michaelwoerister commented Mar 19, 2018

Uh oh!

Mark-Simulacrum commented Mar 19, 2018

Uh oh!

estebank commented Mar 19, 2018

Uh oh!

petrochenkov commented Mar 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb Mar 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikomatsakis commented Mar 19, 2018

Uh oh!

bors commented Mar 23, 2018

Uh oh!

petrochenkov commented Mar 25, 2018

Uh oh!

bors commented Mar 26, 2018

Uh oh!

shepmaster commented Mar 30, 2018

Uh oh!

eddyb commented Mar 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrochenkov commented Apr 3, 2018

Uh oh!

bors commented Apr 6, 2018

Uh oh!

oli-obk commented Apr 7, 2018

Uh oh!

petrochenkov commented Apr 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oli-obk commented Apr 7, 2018

Uh oh!

petrochenkov commented Apr 7, 2018

Uh oh!

petrochenkov commented Mar 19, 2018 •

edited

Loading

petrochenkov commented Mar 19, 2018 •

edited

Loading

eddyb Mar 20, 2018 •

edited

Loading

eddyb commented Mar 30, 2018 •

edited

Loading

petrochenkov commented Apr 7, 2018 •

edited

Loading