Skip to content

Conversation

@CyrusNajmabadi
Copy link
Member

The general philosophy here is that the SlidingTextWindow is just a representation of characters, and doesn't really care anything about "current lexeme". Instead, that entire concept it contained entirely within the lexer itself.

This helps allow us to tweak these components with less risk of regression (like the recent regression around lexing out a dot and ensuring it was not a .. in a range).

{
internal readonly SlidingTextWindow TextWindow;
private List<SyntaxDiagnosticInfo>? _errors;
protected int LexemeStartPosition;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to lexer-layer from SlidingTextWindow

=> TextWindow.GetText(LexemeStartPosition, intern: false);

protected string GetInternedLexemeText()
=> TextWindow.GetText(LexemeStartPosition, intern: true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these .GetText helpers on the TextWindow were used all over the place. MOved to two simple helpers on Lexer for getting either an interned or non-interned chunk of text for the current lexeme.

=> TextWindow.GetText(LexemeStartPosition, intern: true);

protected int CurrentLexemeWidth
=> this.TextWindow.Position - LexemeStartPosition;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was previously exposed as the (imo) confusingly named SlidingTextWindow.Width.

I tried to ensure that helpers related to lexemes used that term in their name to keep it clear what 'width/count/etc.' things are referring to.

internal SyntaxTrivia LookupWhitespaceTrivia(
SlidingTextWindow textWindow,
int lexemeStartPosition,
int hashCode)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to just read as the new function. There was only ever one caller ofr LookupTrivia, and it always used the same generic arg.

if (this.HasErrors)
{
var afterStartDelimiter = TextWindow.LexemeStartPosition + startingQuoteCount;
var afterStartDelimiter = this.LexemeStartPosition + startingQuoteCount;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference to textWindow.LexemeStartPosition were mechanically replaced with this.LexemeStartPosition

}

info.Text = TextWindow.GetText(intern: true);
info.Text = this.GetInternedLexemeText();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to TextWindow.GetText(intern: true) and TextWindow.GetText(intern: false) were mechanically updated to this.GetInternedLexemeText() and this.GetNonInternedLexemeText() mechanically.


// The max index in charWindow that we will quick scan to. This is either the end of the window
// or the position of the largest token we'd be willing to quick scan and cache.
var maxIndexInWindow = Math.Min(charWindow.Length, startIndexInWindow + MaxCachedTokenSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed these all for clarity.

@CyrusNajmabadi
Copy link
Member Author

@dotnet/roslyn-compiler for another pair of eyes.

@CyrusNajmabadi
Copy link
Member Author

@dotnet/roslyn-compiler for another pair of eyes. thanks!

_strings = StringTable.GetInstance();
_characterWindow = s_windowPool.Allocate();
_lexemeStart = 0;
_characterWindow = new(s_windowPool.Allocate());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think target-typed new() is disallowed in the compiler layer.

return _characterWindow;
}
}
public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that _positionInText and the character window might not always intersect. For example, Advance methods only move the position but not the window. Could it happen that the AsSpan would fail then? E.g., because the offset would be negative (position would be before the window).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Good call. I will fix that up.


for (; i < n; i++)
// Where we are currently pointing in the charWindow as we read in a character at a time.
var currentIndex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this inside the for initializer?

Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. it is read below (it is used to figure out how many characters were actually read out).

Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c

return _characterWindow;
}
}
public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Good call. I will fix that up.

Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c

CyrusNajmabadi

This comment was marked as off-topic.

CyrusNajmabadi

This comment was marked as off-topic.

@CyrusNajmabadi CyrusNajmabadi requested a review from jjonescz July 23, 2025 09:10
{
get
{
return _positionInText > _textEnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like the right check. If position is after end, that's shouldn't cause problems. But the _positionInText - _characterWindowStartPositionInText offset computed below can be negative if _characterWindowStartPositionInText is after _positionInText, so I would expect this check to be something like:

Suggested change
return _positionInText > _textEnd
return _positionInText < _characterWindowStartPositionInText

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both woudl actually cause problems (since AsSpan checks start/end pos). I'll validate both sides :)

@CyrusNajmabadi CyrusNajmabadi enabled auto-merge (squash) July 23, 2025 10:16
@CyrusNajmabadi CyrusNajmabadi merged commit a39c638 into dotnet:main Jul 23, 2025
23 of 24 checks passed
@CyrusNajmabadi CyrusNajmabadi deleted the lexerField branch July 23, 2025 11:18
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Jul 23, 2025
@RikkiGibson RikkiGibson modified the milestones: Next, 18.0 P1 Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants