-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>: Small cleanups after completion of non-recursive matcher
#5818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>: Small cleanups after completion of non-recursive matcher
#5818
Conversation
|
The stack limits were set aggressively to try to avoid actual stack overflows (as exceptions are less bad). In the non-recursive era, should these limits be significantly increased? |
I think so, yeah. But we can make a more informed decision for an appropriate value when the stack frame layout is more settled, because then we can simultaneously replace the extra stack counter by That said, there is even an argument that we should just remove this explicit stack limit completely: The memory usage for stack frames is linear in the size of the input (assuming the regex is constant). So we could tell users that they should limit the size of the input themselves if they want to bound the memory usage. |
|
I would be fine with removing the limit in a future PR. Thanks! |
|
Yeah, let's definitely remove the limit in a followup, since it's still blocking repros like VSO-1054746 / DevCom-885115: #include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
try {
const string str(1000, 'a');
smatch match;
cout << boolalpha;
cout << regex_match(str, match, regex{"a+"}) << "\n";
cout << regex_match(str, match, regex{"(?:a)+"}) << "\n";
} catch (const regex_error& e) {
cout << e.what() << "\n";
}
}(I am sure you knew this already, just recording this for completeness.) |
|
For this specific repro, it's not necessary to lift the stack limit. Instead, we could promote diff --git a/stl/inc/regex b/stl/inc/regex
index f6cb5ee5..06142307 100644
--- a/stl/inc/regex
+++ b/stl/inc/regex
@@ -5354,7 +5354,6 @@ void _Parser2<_FwdIt, _Elem, _RxTraits>::_Calculate_loop_simplicity(
}
break;
- case _N_group:
case _N_capture:
// TRANSITION, requires more research to decide on the subset of loops that we can make simple:
// - Simple mode can square the running time when matching a regex to an input string in the current matcher
@@ -5364,6 +5363,7 @@ void _Parser2<_FwdIt, _Elem, _RxTraits>::_Calculate_loop_simplicity(
}
break;
+ case _N_group:
case _N_none:
case _N_nop:
case _N_bol:This is a change we should do at some point anyway. I have just been holding back on this until the changes to simple loop matching are mostly done. But I have to think this change through first. Maybe we have to reject simple loop status for some node types that can only appear in the repeated pattern when wrapped in a (non-capturing or capturing) group. BTW, does this mean you want to reopen #997? That's the issue for this repro. I closed it because I fixed the stack overflow, but obviously this was replaced by |
|
Thanks for explaining! Yes, I've gone ahead and reopened #997, thanks. The user clearly desires for the matching to succeed; replacing the stack overflow with a |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
🧹 🪄 🧹 |
This PR makes a few small changes, which we can do now that the matcher has become fully non-recursive.
<regex>: Process generic loops non-recursively #5798's fails-for-x64-but-works-for-x86 test to the usual pattern._Matcher3::_Push_frame(): When I added the arguments to_Push_frame(), I used defaults to avoid changing all existing call sites in the recursive matcher code. Now that the recursive matcher code is gone, there is no longer any call site with zero arguments, so the defaults have lost their purpose. (There was one call passing a single argument, which I adjusted.)_N_end + 1is actually a remnant of an early attempt to use a single large switch statement which I abandoned. Now that the matcher has been made completely non-recursive, I think it has become sufficiently clear that_Node_typeand_Rx_unwind_opsare used in separate contexts and there is no danger of confusion. (I still omit0to catch accidental initialization with0)._Initial_frames_countfrom_Matcher3::_Match_pat(): There are no longer any recursive calls, so_Initial_frames_countis always zero now.