-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Cleanup split between lexer and sliding-text-window #79205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| { | ||
| internal readonly SlidingTextWindow TextWindow; | ||
| private List<SyntaxDiagnosticInfo>? _errors; | ||
| protected int LexemeStartPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to lexer-layer from SlidingTextWindow
| => TextWindow.GetText(LexemeStartPosition, intern: false); | ||
|
|
||
| protected string GetInternedLexemeText() | ||
| => TextWindow.GetText(LexemeStartPosition, intern: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these .GetText helpers on the TextWindow were used all over the place. MOved to two simple helpers on Lexer for getting either an interned or non-interned chunk of text for the current lexeme.
| => TextWindow.GetText(LexemeStartPosition, intern: true); | ||
|
|
||
| protected int CurrentLexemeWidth | ||
| => this.TextWindow.Position - LexemeStartPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was previously exposed as the (imo) confusingly named SlidingTextWindow.Width.
I tried to ensure that helpers related to lexemes used that term in their name to keep it clear what 'width/count/etc.' things are referring to.
| internal SyntaxTrivia LookupWhitespaceTrivia( | ||
| SlidingTextWindow textWindow, | ||
| int lexemeStartPosition, | ||
| int hashCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easier to just read as the new function. There was only ever one caller ofr LookupTrivia, and it always used the same generic arg.
| if (this.HasErrors) | ||
| { | ||
| var afterStartDelimiter = TextWindow.LexemeStartPosition + startingQuoteCount; | ||
| var afterStartDelimiter = this.LexemeStartPosition + startingQuoteCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to textWindow.LexemeStartPosition were mechanically replaced with this.LexemeStartPosition
| } | ||
|
|
||
| info.Text = TextWindow.GetText(intern: true); | ||
| info.Text = this.GetInternedLexemeText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
References to TextWindow.GetText(intern: true) and TextWindow.GetText(intern: false) were mechanically updated to this.GetInternedLexemeText() and this.GetNonInternedLexemeText() mechanically.
|
|
||
| // The max index in charWindow that we will quick scan to. This is either the end of the window | ||
| // or the position of the largest token we'd be willing to quick scan and cache. | ||
| var maxIndexInWindow = Math.Min(charWindow.Length, startIndexInWindow + MaxCachedTokenSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed these all for clarity.
|
@dotnet/roslyn-compiler for another pair of eyes. |
|
@dotnet/roslyn-compiler for another pair of eyes. thanks! |
| _strings = StringTable.GetInstance(); | ||
| _characterWindow = s_windowPool.Allocate(); | ||
| _lexemeStart = 0; | ||
| _characterWindow = new(s_windowPool.Allocate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think target-typed new() is disallowed in the compiler layer.
| return _characterWindow; | ||
| } | ||
| } | ||
| public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that _positionInText and the character window might not always intersect. For example, Advance methods only move the position but not the window. Could it happen that the AsSpan would fail then? E.g., because the offset would be negative (position would be before the window).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Good call. I will fix that up.
|
|
||
| for (; i < n; i++) | ||
| // Where we are currently pointing in the charWindow as we read in a character at a time. | ||
| var currentIndex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this inside the for initializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. it is read below (it is used to figure out how many characters were actually read out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c
| return _characterWindow; | ||
| } | ||
| } | ||
| public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Good call. I will fix that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c
Co-authored-by: Jan Jones <[email protected]>
| { | ||
| get | ||
| { | ||
| return _positionInText > _textEnd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like the right check. If position is after end, that's shouldn't cause problems. But the _positionInText - _characterWindowStartPositionInText offset computed below can be negative if _characterWindowStartPositionInText is after _positionInText, so I would expect this check to be something like:
| return _positionInText > _textEnd | |
| return _positionInText < _characterWindowStartPositionInText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both woudl actually cause problems (since AsSpan checks start/end pos). I'll validate both sides :)
The general philosophy here is that the SlidingTextWindow is just a representation of characters, and doesn't really care anything about "current lexeme". Instead, that entire concept it contained entirely within the lexer itself.
This helps allow us to tweak these components with less risk of regression (like the recent regression around lexing out a
dotand ensuring it was not a..in a range).