Skip to content

Conversation

@rhuanjl
Copy link
Collaborator

@rhuanjl rhuanjl commented Aug 12, 2018

This PR implements the es2018 RegExp /s dotAll flag.

See proposal spec here: https://tc39.github.io/proposal-regexp-dotall-flag/

Notes:

  1. I've given this the flag ES9RegExDotAll but I've made it default to enabled
  2. When run with the ES6RegExPrototypeProperties this passes all relevant test262 tests, without that flag several test262 tests fail (hopefully that can be enabled soon)

fixes: #2787

cc: @mathiasbynens

@rhuanjl rhuanjl force-pushed the dotall branch 5 times, most recently from dc42990 to f4d6c0e Compare August 12, 2018 09:30
r.sticky = true; // no-op
r.dotAll = true; // no-op
assert.areEqual(expectedUnicode, r.unicode, r + ": unicode");
assert.areEqual(expectedSticky, r.sticky, r + ": sticky");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you could add a third assertion here for expectedDotAll

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I thought I had I'll update that.

case PropertyIds::options:
IS_WRITABLE(!scriptConfig->IsES6RegExPrototypePropertiesEnabled())
case PropertyIds::dotAll:
IS_WRITABLE(!scriptConfig->IsES9RegExDotAllEnabled())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one not follow the pattern of the others? In every other method the corresponding case looks like (scriptConfig->IsES9RegExDotAllEnabled() && !scriptConfig->IsES6RegExPrototypePropertiesEnabled()).

{
library->regexDotAllGetterFunction =
library->AddGetterToLibraryObject(regexPrototype, PropertyIds::dotAll, &JavascriptRegExp::EntryInfo::GetterDotAll);
library->regexDotAllGetterSlotIndex = 21;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if dotAll is enabled but unicode or sticky is not? Would this index be wrong?

bool JavascriptLibrary::InitializeRegexPrototype(DynamicObject* regexPrototype, DeferredTypeHandlerBase * typeHandler, DeferredInitializeMode mode)
{
typeHandler->Convert(regexPrototype, mode, 24);
typeHandler->Convert(regexPrototype, mode, 25);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want 26 here because I think a getter/setter pair (even a getter/setter with undefined setter) takes two slots. You could verify by stepping through and seeing if the type handler has to grow when starting with 25 and adding the last thing (_symbolSplit).

Copy link
Contributor

@sethbrenith sethbrenith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@boingoing boingoing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@kfarnung
Copy link
Contributor

@dilijev Can you take a look at this change?

@dilijev
Copy link
Contributor

dilijev commented Aug 17, 2018

We generally only use ES now because ES6 was the last version where people universally used ES to refer to the versions. There's already precedent in our host flags for "ES2018" (-ES2018ObjectSpread).

I'll run this through internal tests.

if ((flags & GlobalRegexFlag) != 0) w->Print(_u("global "));
if ((flags & MultilineRegexFlag) != 0) w->Print(_u("multiline "));
if ((flags & IgnoreCaseRegexFlag) != 0) w->Print(_u("ignorecase"));
if ((flags & DotAllRegexFlag) != 0) w->Print(_u("dotAll"));
Copy link
Contributor

@dilijev dilijev Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ((flags & DotAllRegexFlag) != 0) w->Print(_u("dotAll")); [](start = 8, length = 59)

nit: all of these strings should have a space at the end (like "global " above) so that they are spaced out correctly when printed (obviously you don't need to change all of them, but if you could change this one, I'd appreciate it)


void StandardChars<char16>::SetFullSet(ArenaAllocator* setAllocator, CharSet<Char> &set)
{
set.SetRange(allocator, MinChar, MaxChar);
Copy link
Contributor

@dilijev dilijev Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set.SetRange [](start = 8, length = 12)

This is fine as-is, but for memory usage reasons (and some runtime efficiency), I'd prefer if you could make the set the equivalent of [^<none>] instead of the equivalent of [\x00-\u{1FFFFF}] as you've done here. You should be able to use something like SetNotRanges(0, "") to accomplish this, but if there's some assertions or preconditions there that you're not using empty string, you might need a new method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetNotRanges with 0 sorted pairs currently literally does what I've done here:
https://github.com/Microsoft/ChakraCore/blob/bffb8850cd491ba148b98c4ee9f322312d374c56/lib/Parser/CharSet.cpp#L1137-L1150

All the NotRanges seem to be handled in that way.

Copy link
Contributor

@dilijev dilijev Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is surprising to me since we have a concept of a negated set which is more memory efficient. Possibly that was a later addition than the sets themselves. I'll look into this at a later time; won't block the PR on it. /cc @kfukuda2
Created #5633 to track

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change dotAll one to use SetNotRanges so if that method is updated to use negated sets this will benefit from the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

case PropertyIds::options:
SET_PROPERTY(!scriptConfig->IsES6RegExPrototypePropertiesEnabled());
case PropertyIds::dotAll:
SET_PROPERTY(scriptConfig->IsES9RegExDotAllEnabled() && !scriptConfig->IsES6RegExPrototypePropertiesEnabled());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ES9 [](start = 41, length = 3)

does this ES2018 feature check make sense with the ES6 check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess this would have to be blocked on the same feature.


In reply to: 210786581 [](ancestors = 210786581)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not blocked on the ES6 feature check it's rather that all of these properties behave differently without the ES6 Prototype properties enabled.

With ES2018RegExDotAllEnabled and ES6RegExPrototypePropertiesEnabled the prototype has a dotAll getter only property which returns if the RegEx it's called on has the dotAll flag.

If dotAll is enabled but the Prototype properties are not enabled it's a property directly on the RegExp object which returns true (as opposed to a getter from the prototype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had a look and found the same. Thanks for clarifying.

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Looks good overall, and passed internal tests. There's just a few things you should look into, and you'll need to rebase and regen bytecode again on top of that to "resolve" the bytecode conflicts.

@rhuanjl
Copy link
Collaborator Author

rhuanjl commented Aug 17, 2018

@dilijev thanks for the review, I've addressed most of your points and updated to latest. I've held off on squashing for now in case you wish to check the new changes separately.

But I'm afraid I don't know how to address the SetRange point - see my comment above.

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dilijev
Copy link
Contributor

dilijev commented Aug 25, 2018

Looks like #5584 caused a bytecode conflict again. Sorry about that. I'm going to tag pull requests that regen bytecode so we can pay closer attention to serializing such PRs in a way that reduces external contributor pain. /cc @jackhorton

@dilijev dilijev added the Bytecode-Update This PR updates bytecode and will cause merge conflicts with other PRs with this label label Aug 25, 2018
@rhuanjl
Copy link
Collaborator Author

rhuanjl commented Aug 25, 2018

Thanks @dilijev
Any recent pain on ByteCode regens has likely been partly my fault considering I've got this one + Object.fromEntries open and I also recently had merged Array.flat/flatMap.

Hopefully this is good to merge now (Note I've left the Bytecode regen as a separate commit so it can be easily dropped if there's another merge conflict - obviously either this or Object.fromEntries will have to have another regen...)

@chakrabot chakrabot merged commit 4bcc8d1 into chakra-core:master Aug 28, 2018
chakrabot pushed a commit that referenced this pull request Aug 28, 2018
Merge pull request #5592 from rhuanjl:dotall

This PR implements the es2018 RegExp /s dotAll flag.

See proposal spec here: https://tc39.github.io/proposal-regexp-dotall-flag/

Notes:
1. I've given this the flag ES9RegExDotAll but I've made it default to enabled
2. When run with the ES6RegExPrototypeProperties this passes all relevant test262 tests, without that flag several test262 tests fail (hopefully that can be enabled soon)

fixes: #2787

cc: @mathiasbynens
@rhuanjl rhuanjl deleted the dotall branch August 28, 2018 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bytecode-Update This PR updates bytecode and will cause merge conflicts with other PRs with this label External Contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RegExp: Implement dotall (/s) flag

6 participants