Skip to content

Conversation

@muellerj2
Copy link
Contributor

This PR makes a few small changes, which we can do now that the matcher has become fully non-recursive.

  • Align x64's max stack limit with x86's: The limit was lower for x64 because the matcher used more stack per recursive call. Now that the matcher has become fully non-recursive, this difference has lost its purpose and we can just always use the greater x86 limit. With this change, we can also adjust <regex>: Process generic loops non-recursively #5798's fails-for-x64-but-works-for-x86 test to the usual pattern.
  • Remove default arguments from _Matcher3::_Push_frame(): When I added the arguments to _Push_frame(), I used defaults to avoid changing all existing call sites in the recursive matcher code. Now that the recursive matcher code is gone, there is no longer any call site with zero arguments, so the defaults have lost their purpose. (There was one call passing a single argument, which I adjusted.)
  • Start unwinding opcodes at 1: The usage of _N_end + 1 is actually a remnant of an early attempt to use a single large switch statement which I abandoned. Now that the matcher has been made completely non-recursive, I think it has become sufficiently clear that _Node_type and _Rx_unwind_ops are used in separate contexts and there is no danger of confusion. (I still omit 0 to catch accidental initialization with 0).
  • Remove _Initial_frames_count from _Matcher3::_Match_pat(): There are no longer any recursive calls, so _Initial_frames_count is always zero now.

@muellerj2 muellerj2 requested a review from a team as a code owner October 30, 2025 19:51
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 30, 2025
@StephanTLavavej StephanTLavavej added enhancement Something can be improved regex meow is a substring of homeowner labels Oct 30, 2025
@StephanTLavavej StephanTLavavej self-assigned this Oct 30, 2025
@StephanTLavavej
Copy link
Member

The stack limits were set aggressively to try to avoid actual stack overflows (as exceptions are less bad). In the non-recursive era, should these limits be significantly increased?

@muellerj2
Copy link
Contributor Author

muellerj2 commented Oct 30, 2025

In the non-recursive era, should these limits be significantly increased?

I think so, yeah. But we can make a more informed decision for an appropriate value when the stack frame layout is more settled, because then we can simultaneously replace the extra stack counter by _Frames_count.

That said, there is even an argument that we should just remove this explicit stack limit completely: The memory usage for stack frames is linear in the size of the input (assuming the regex is constant). So we could tell users that they should limit the size of the input themselves if they want to bound the memory usage.

@StephanTLavavej
Copy link
Member

I would be fine with removing the limit in a future PR. Thanks!

@StephanTLavavej StephanTLavavej removed their assignment Oct 31, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Oct 31, 2025
@StephanTLavavej
Copy link
Member

Yeah, let's definitely remove the limit in a followup, since it's still blocking repros like VSO-1054746 / DevCom-885115:

D:\GitHub\STL\out\x64>type meow.cpp
#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main() {
    try {
        const string str(1000, 'a');
        smatch match;
        cout << boolalpha;
        cout << regex_match(str, match, regex{"a+"}) << "\n";
        cout << regex_match(str, match, regex{"(?:a)+"}) << "\n";
    } catch (const regex_error& e) {
        cout << e.what() << "\n";
    }
}
D:\GitHub\STL\out\x64>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp
meow.cpp

D:\GitHub\STL\out\x64>meow
true
regex_error(error_stack): There was insufficient memory to determine whether the regular expression could match the specified character sequence.

(I am sure you knew this already, just recording this for completeness.)

@muellerj2
Copy link
Contributor Author

For this specific repro, it's not necessary to lift the stack limit. Instead, we could promote (?:a)+ to a simple loop:

diff --git a/stl/inc/regex b/stl/inc/regex
index f6cb5ee5..06142307 100644
--- a/stl/inc/regex
+++ b/stl/inc/regex
@@ -5354,7 +5354,6 @@ void _Parser2<_FwdIt, _Elem, _RxTraits>::_Calculate_loop_simplicity(
             }
             break;

-        case _N_group:
         case _N_capture:
             // TRANSITION, requires more research to decide on the subset of loops that we can make simple:
             // - Simple mode can square the running time when matching a regex to an input string in the current matcher
@@ -5364,6 +5363,7 @@ void _Parser2<_FwdIt, _Elem, _RxTraits>::_Calculate_loop_simplicity(
             }
             break;

+        case _N_group:
         case _N_none:
         case _N_nop:
         case _N_bol:

This is a change we should do at some point anyway. I have just been holding back on this until the changes to simple loop matching are mostly done.

But I have to think this change through first. Maybe we have to reject simple loop status for some node types that can only appear in the repeated pattern when wrapped in a (non-capturing or capturing) group.

BTW, does this mean you want to reopen #997? That's the issue for this repro. I closed it because I fixed the stack overflow, but obviously this was replaced by regex_error(error_stack).

@StephanTLavavej
Copy link
Member

Thanks for explaining!

Yes, I've gone ahead and reopened #997, thanks. The user clearly desires for the matching to succeed; replacing the stack overflow with a regex_error is certainly an improvement but doesn't quite let me resolve our accumulated bugs.

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Nov 4, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit cf36aa9 into microsoft:main Nov 5, 2025
41 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Nov 5, 2025
@StephanTLavavej
Copy link
Member

🧹 🪄 🧹

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Something can be improved regex meow is a substring of homeowner

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants