-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement RegExp dotAll flag (/s) #5592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dc42990 to
f4d6c0e
Compare
| r.sticky = true; // no-op | ||
| r.dotAll = true; // no-op | ||
| assert.areEqual(expectedUnicode, r.unicode, r + ": unicode"); | ||
| assert.areEqual(expectedSticky, r.sticky, r + ": sticky"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you could add a third assertion here for expectedDotAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I thought I had I'll update that.
| case PropertyIds::options: | ||
| IS_WRITABLE(!scriptConfig->IsES6RegExPrototypePropertiesEnabled()) | ||
| case PropertyIds::dotAll: | ||
| IS_WRITABLE(!scriptConfig->IsES9RegExDotAllEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this one not follow the pattern of the others? In every other method the corresponding case looks like (scriptConfig->IsES9RegExDotAllEnabled() && !scriptConfig->IsES6RegExPrototypePropertiesEnabled()).
| { | ||
| library->regexDotAllGetterFunction = | ||
| library->AddGetterToLibraryObject(regexPrototype, PropertyIds::dotAll, &JavascriptRegExp::EntryInfo::GetterDotAll); | ||
| library->regexDotAllGetterSlotIndex = 21; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if dotAll is enabled but unicode or sticky is not? Would this index be wrong?
| bool JavascriptLibrary::InitializeRegexPrototype(DynamicObject* regexPrototype, DeferredTypeHandlerBase * typeHandler, DeferredInitializeMode mode) | ||
| { | ||
| typeHandler->Convert(regexPrototype, mode, 24); | ||
| typeHandler->Convert(regexPrototype, mode, 25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want 26 here because I think a getter/setter pair (even a getter/setter with undefined setter) takes two slots. You could verify by stepping through and seeing if the type handler has to grow when starting with 25 and adding the last thing (_symbolSplit).
sethbrenith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
boingoing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
|
@dilijev Can you take a look at this change? |
|
We generally only use ES now because ES6 was the last version where people universally used ES to refer to the versions. There's already precedent in our host flags for "ES2018" ( I'll run this through internal tests. |
lib/Parser/RegexRuntime.cpp
Outdated
| if ((flags & GlobalRegexFlag) != 0) w->Print(_u("global ")); | ||
| if ((flags & MultilineRegexFlag) != 0) w->Print(_u("multiline ")); | ||
| if ((flags & IgnoreCaseRegexFlag) != 0) w->Print(_u("ignorecase")); | ||
| if ((flags & DotAllRegexFlag) != 0) w->Print(_u("dotAll")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((flags & DotAllRegexFlag) != 0) w->Print(_u("dotAll")); [](start = 8, length = 59)
nit: all of these strings should have a space at the end (like "global " above) so that they are spaced out correctly when printed (obviously you don't need to change all of them, but if you could change this one, I'd appreciate it)
lib/Parser/StandardChars.cpp
Outdated
|
|
||
| void StandardChars<char16>::SetFullSet(ArenaAllocator* setAllocator, CharSet<Char> &set) | ||
| { | ||
| set.SetRange(allocator, MinChar, MaxChar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set.SetRange [](start = 8, length = 12)
This is fine as-is, but for memory usage reasons (and some runtime efficiency), I'd prefer if you could make the set the equivalent of [^<none>] instead of the equivalent of [\x00-\u{1FFFFF}] as you've done here. You should be able to use something like SetNotRanges(0, "") to accomplish this, but if there's some assertions or preconditions there that you're not using empty string, you might need a new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetNotRanges with 0 sorted pairs currently literally does what I've done here:
https://github.com/Microsoft/ChakraCore/blob/bffb8850cd491ba148b98c4ee9f322312d374c56/lib/Parser/CharSet.cpp#L1137-L1150
All the NotRanges seem to be handled in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change dotAll one to use SetNotRanges so if that method is updated to use negated sets this will benefit from the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
| case PropertyIds::options: | ||
| SET_PROPERTY(!scriptConfig->IsES6RegExPrototypePropertiesEnabled()); | ||
| case PropertyIds::dotAll: | ||
| SET_PROPERTY(scriptConfig->IsES9RegExDotAllEnabled() && !scriptConfig->IsES6RegExPrototypePropertiesEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ES9 [](start = 41, length = 3)
does this ES2018 feature check make sense with the ES6 check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess this would have to be blocked on the same feature.
In reply to: 210786581 [](ancestors = 210786581)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not blocked on the ES6 feature check it's rather that all of these properties behave differently without the ES6 Prototype properties enabled.
With ES2018RegExDotAllEnabled and ES6RegExPrototypePropertiesEnabled the prototype has a dotAll getter only property which returns if the RegEx it's called on has the dotAll flag.
If dotAll is enabled but the Prototype properties are not enabled it's a property directly on the RegExp object which returns true (as opposed to a getter from the prototype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I had a look and found the same. Thanks for clarifying.
dilijev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Looks good overall, and passed internal tests. There's just a few things you should look into, and you'll need to rebase and regen bytecode again on top of that to "resolve" the bytecode conflicts.
|
@dilijev thanks for the review, I've addressed most of your points and updated to latest. I've held off on squashing for now in case you wish to check the new changes separately. But I'm afraid I don't know how to address the SetRange point - see my comment above. |
dilijev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Looks like #5584 caused a bytecode conflict again. Sorry about that. I'm going to tag pull requests that regen bytecode so we can pay closer attention to serializing such PRs in a way that reduces external contributor pain. /cc @jackhorton |
|
Thanks @dilijev Hopefully this is good to merge now (Note I've left the Bytecode regen as a separate commit so it can be easily dropped if there's another merge conflict - obviously either this or Object.fromEntries will have to have another regen...) |
Merge pull request #5592 from rhuanjl:dotall This PR implements the es2018 RegExp /s dotAll flag. See proposal spec here: https://tc39.github.io/proposal-regexp-dotall-flag/ Notes: 1. I've given this the flag ES9RegExDotAll but I've made it default to enabled 2. When run with the ES6RegExPrototypeProperties this passes all relevant test262 tests, without that flag several test262 tests fail (hopefully that can be enabled soon) fixes: #2787 cc: @mathiasbynens
This PR implements the es2018 RegExp /s dotAll flag.
See proposal spec here: https://tc39.github.io/proposal-regexp-dotall-flag/
Notes:
fixes: #2787
cc: @mathiasbynens