We were translating the pattern [\⪾-\⫀] to [\\u2abe-\\u2ac0], which
is a very different pattern; as a code unit converted to the \uhhh
format has no meaning when escaped, this commit makes us simply skip
escaping it when translating the pattern.
This makes it ever-so-slightly faster, but more importantly, it fixes
the bug where a `/\//` regex's `source` property would return `\\/`
("\\\\/") instead of `\/` due to the existing '/' -> '\/' replace()
call.
This is a normative change in the ECMA-262 spec. See:
https://github.com/tc39/ecma262/commit/35b7eb2
Note there is a bit of weirdness between the mainline spec and the set
notation proposal as the latter has not been updated with this change.
For now, this implements what the spec PR and other prototypes indicate
how the proposal will behave.
We move the input string into this field to avoid a string copy, so we
must do this step last to avoid using any views into it (note that
match.view here is a view into this string).
The spec requires that invalid RegExp literals must cause a Syntax Error
before the JavaScript is executed. See:
https://tc39.es/ecma262/#sec-patterns-static-semantics-early-errors
This is explicitly tested in the RegExp/property-escapes test262 tests.
For example, see unsupported-property-Line_Break.js:
$DONOTEVALUATE();
/\p{Line_Break}/u;
That RegExp literal is invalid because Line_Break is not a supported
Unicode property. $DONOTEVALUATE() just throws an exception when it is
executed. The test expects that this file will fail to be parsed.
Note that RegExp patterns can still be parsed at execution time by way
of "new RegExp(...)".
This allows passing an existing RegExp object (or an object that is
sufficiently like a RegExp object) as the "pattern" argument of the
RegExp constructor.
This is a partial revert of commit 60064e2, which removed the validation
of RegExp flags during runtime and expected the parser to do that
exclusively - however this was not taking into account the RegExp()
constructor, which was subsequently crashing on invalid flags.
Also adds test for these constructor error cases, which were obviously
missing before.
Fixes#7042.
This patch changes the validation of RegExp flags (checking for
invalid and duplicate values) from a SyntaxError at runtime to a
SyntaxError at parse time - it's not something that's supposed to be
catchable.
As a nice side effect, this simplifies the RegExpObject constructor a
bit, as it can no longer throw an exception and doesn't have to validate
the flags itself.
This would crash on an undefined match (no match), since the matched
result was assumed to be a string (such as on discord.com). The spec
suggests converting it to a string as well:
https://tc39.es/ecma262/#sec-regexp.prototype-@@replace (14#c)
By using regex::AllFlags::SkipTrimEmptyMatches we get a null string for
unmatched capture groups, which we then turn into an undefined entry in
the result array instead of putting all matches first and appending
undefined for the remaining number of capture groups - e.g. for
/foo(ba((r)|(z)))/.exec("foobaz")
we now return
["foobaz", "baz", "z", undefined, "z"]
and not [
["foobaz", "baz", "z", "z", undefined]
Fixes part of #6042.
Also happens to fix selecting an element by ID using jQuery's $("#foo").
This only applies to the ECMA262 parser.
This behaviour is an ECMA262-specific quirk, such references always
generate zero-length matches (even on subsequent passes).
Also adds a test in LibJS's test suite.
Fixes#6039.
This was not implementing the following part of the spec correctly:
27. For each integer i such that i ≥ 1 and i ≤ n, do
a. Let captureI be ith element of r's captures List.
b. If captureI is undefined, let capturedValue be undefined.
Expecting a capture group match to exist for each of the RegExp's
capture groups would assert in Vector's operator[] if that's not the
case, for example:
/(foo)(bar)?/.exec("foo")
Append undefined instead.
Fixes#5256.