-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Overhaul when/where we check for Regex timeouts #68146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsRegex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count. This will fail CI until #68138 is merged. (This does not change how we do timeout checks in the NonBacktracking implementation. Based on the above criteria, timeout checks in NonBacktracking are optional. However, we'll likely want to continue doing them periodically, for consistency and some level of predictability.)
|
|
I suppose you have run them already, but can you share some perf numbers from our existing benchmarks to see how much (if any) impact (positive or negative) these changes around timeout might have? |
I ran a worst-case test that would cause us to perform a timeout check at essentially every step, and I saw no measurable impact from calling Environment.TickCount64 at each check rather than maintaining a counter and calling Environment.TickCount 1/1000 such checks; TickCount{64} is cheap, and the other costs involved simply dominate. This also removes the checks on paths that have less overhead and are hotter, e.g. as part of doing the initial linear match of a loop, so all of the timeout overhead goes away in those cases. |
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.Timeout.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.Timeout.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.Timeout.cs
Outdated
Show resolved
Hide resolved
...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
Outdated
Show resolved
Hide resolved
joperezr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but looks good otherwise. We will probably want to update our docs in dotnet/docs as well as dotnet/dotnet-api-docs once this goes in.
Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.
7eb7149 to
9b8ca87
Compare
* Overhaul when/where we check for timeouts Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count. * Address PR feedback
Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.
This will fail CI until #68138 is merged.
(This does not change how we do timeout checks in the NonBacktracking implementation. Based on the above criteria, timeout checks in NonBacktracking are optional. However, we'll likely want to continue doing them periodically, for consistency and some level of predictability.)