We were mistakenly trying to append UTF-16 code units to a StringBuilder
via the append(char) API. This patch fixes that by accumulating the
result in a Vector<u16> instead.
This'll be a bit worse for performance, since we're now doing additional
UTF-16 string conversions, but we're going for correctness at this stage
and can worry about performance later.
This follows the ECMA402 spec and means String.prototype.localeCompare
will automatically become actually locale aware once StringCompare is
actually implemented based on UTS #10.
This partially reverts commit a962ee020a.
When the sticky bit is set, the global bit should basically be ignored
except by external callers who want their own special behavior. For
example, RegExp.prototype [ @@match ] will use the global flag to
accumulate consecutive matches. But on the first failure, the regex
loop should break.
length_in_code_units() returns a size_t, which is 64-bit unsigned
in i686 builds. `size + (i32)int_length` hence produced a 64-bit
unsigned result, so a negative value would wrap around and become
a very large number.
As fix, just omit the cast -- we assign the result of max() to
a double anyways.
With this, all test262 tests in annexB/built-ins/String/prototype pass.
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.
Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
This also converts the GetSubstitution abstract operation take its input
strings as UTF-16 now that all callers are UTF-16 capable. This means
String.prototype.replace (and replaceAll) no longer needs UTF-8 and
UTF-16 copies of these strings.
This also implements the CodePointAt abstract operation. This is needed
to handle invalid code units specific to the JavaScript spec, rather
than e.g. inserting replacement code units. This abstraction is public
because RegExp.prototype will also need it.
These methods did not require UTF-16 views, so just add test cases to
ensure they remain correct.
This also adds a couple of FIXME comments on tests that will fail even
with UTF-16 String.prototype support (for reasons such as lack of UTF-16
support in RegExp.prototype and Unicode case folding).
This only causes 1 new test262 test to pass. Other tests that rely on
this coercion fail due to receiving an unexpected value for 'this' when
invoking a functional replacement. For example:
String/prototype/replaceAll/replaceValue-call-matching-empty.js
Receives 'undefined' for 'this' in the functional replacement invocation
but is expected to receive the global 'this'.
The String.prototype.replace spec requires evaluating the replacement
value (if it is not a function) before searching the source string.
Fixes 4 test262 tests.
Per the spec, before invoking the GetSubstitution abstraction, the named
capture groups (if not undefined) should be coerced to an object via the
ToObject abstraction.
The implementation of String.prototype.replaceAll cannot use AK's
implementation of String::find_all when finding the indices of the
search string in the source string. String::find_all will return indices
[0, 1] for String("aaa").find_all("aa") - i.e. it returns overlapping
results. This is not allowed by the JavaScript specification for
replaceAll.