Skip to content

Conversation

@stephentoub
Copy link
Member

Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.

This will fail CI until #68138 is merged.

(This does not change how we do timeout checks in the NonBacktracking implementation. Based on the above criteria, timeout checks in NonBacktracking are optional. However, we'll likely want to continue doing them periodically, for consistency and some level of predictability.)

@ghost
Copy link

ghost commented Apr 18, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking. As such, we already allow an O(n) amount of work in many cases between timeout checks. This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack. It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.

This will fail CI until #68138 is merged.

(This does not change how we do timeout checks in the NonBacktracking implementation. Based on the above criteria, timeout checks in NonBacktracking are optional. However, we'll likely want to continue doing them periodically, for consistency and some level of predictability.)

Author: stephentoub
Assignees: stephentoub
Labels:

area-System.Text.RegularExpressions

Milestone: -

@joperezr
Copy link
Member

I suppose you have run them already, but can you share some perf numbers from our existing benchmarks to see how much (if any) impact (positive or negative) these changes around timeout might have?

@stephentoub
Copy link
Member Author

I suppose you have run them already

I ran a worst-case test that would cause us to perform a timeout check at essentially every step, and I saw no measurable impact from calling Environment.TickCount64 at each check rather than maintaining a counter and calling Environment.TickCount 1/1000 such checks; TickCount{64} is cheap, and the other costs involved simply dominate. This also removes the checks on paths that have less overhead and are hotter, e.g. as part of doing the initial linear match of a loop, so all of the timeout overhead goes away in those cases.

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but looks good otherwise. We will probably want to update our docs in dotnet/docs as well as dotnet/dotnet-api-docs once this goes in.

Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking.  As such, we already allow an O(n) amount of work in many cases between timeout checks.  This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack.  It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.
@stephentoub stephentoub merged commit a88e4f5 into dotnet:main Apr 20, 2022
@stephentoub stephentoub deleted the redotimeouts branch April 20, 2022 10:14
directhex pushed a commit to directhex/runtime that referenced this pull request Apr 21, 2022
* Overhaul when/where we check for timeouts

Regex timeouts have never been about guaranteeing exact timeout handling; rather, they're about avoiding catastrophic backtracking.  As such, we already allow an O(n) amount of work in many cases between timeout checks.  This change formalizes that, such that we now check for a timeout at every place where we could do at least an O(n) amount of work, which essentially means every time we match at a new index and every time we backtrack.  It also removes the counting logic that previously translated only 1 out of 1000 CheckTimeout calls into a timeout check; now every CheckTimeout will query the current tick count.

* Address PR feedback
@ghost ghost locked as resolved and limited conversation to collaborators May 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants