Commit graph

60 commits

Author SHA1 Message Date
Timothy Flynn
3ee5217adc LibJS: Implement String.prototype.toWellFormed 2022-12-01 17:03:55 +01:00
Timothy Flynn
0bb46235a7 LibJS: Implement String.prototype.isWellFormed 2022-12-01 17:03:55 +01:00
Andreas Kling
f7a252ae85 LibJS: Fix UTF-16 corruption in String.prototype.replace()
We were mistakenly trying to append UTF-16 code units to a StringBuilder
via the append(char) API. This patch fixes that by accumulating the
result in a Vector<u16> instead.

This'll be a bit worse for performance, since we're now doing additional
UTF-16 string conversions, but we're going for correctness at this stage
and can worry about performance later.
2022-11-19 11:30:06 -07:00
Timothy Flynn
7fc03e8967 LibJS: Use Unicode normalization within String.prototype.normalize 2022-10-06 22:14:44 +01:00
davidot
446a10a1ac LibJS: Implement normative change in String.prototype.substr
And add spec comments while we're in the neighborhood.
2022-09-21 16:59:58 +01:00
stelar7
771e3b9868 LibJS: Stub out String.prototype.normalize 2022-06-02 23:04:27 +01:00
Idan Horowitz
7ae2debf6e LibJS: Re-implement String.localeCompare using the StringCompare AO
This follows the ECMA402 spec and means String.prototype.localeCompare
will automatically become actually locale aware once StringCompare is
actually implemented based on UTS #10.
2022-02-20 22:05:59 -05:00
Timothy Flynn
27d3de1f17 LibRegex: Do not continue searching input when the sticky bit is set
This partially reverts commit a962ee020a.

When the sticky bit is set, the global bit should basically be ignored
except by external callers who want their own special behavior. For
example, RegExp.prototype [ @@match ] will use the global flag to
accumulate consecutive matches. But on the first failure, the regex
loop should break.
2022-02-05 19:06:50 +03:30
Nico Weber
1b944b4c41 LibJS: Fix substr() with negative arguments larger than string length
length_in_code_units() returns a size_t, which is 64-bit unsigned
in i686 builds. `size + (i32)int_length` hence produced a 64-bit
unsigned result, so a negative value would wrap around and become
a very large number.

As fix, just omit the cast -- we assign the result of max() to
a double anyways.

With this, all test262 tests in annexB/built-ins/String/prototype pass.
2022-01-14 11:12:24 +01:00
Timothy Flynn
207319ecf1 LibJS: Implement ECMA-402 String.prototype.toLocale{Lower,Upper}Case 2021-09-06 15:24:27 +01:00
Timothy Flynn
4f2cbe119b LibRegex: Allow Unicode escape sequences in capture group names
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.

Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
2021-08-19 23:49:25 +02:00
Timothy Flynn
2f8eb4f068 LibJS: Implement non-ECMA-402 String.prototype.toLocale{Lower,Upper}Case
In implementations without ECMA-402, these methods are to behave like
their non-locale equivalents.
2021-07-27 22:35:24 +01:00
Timothy Flynn
2dc9ae00af LibJS: Use special case folding for String.prototype.to{Lower,Upper}Case
Note we still have one String.prototype.toLowerCase test262 failure due
to not yet parsing WordBreakProperty.txt.
2021-07-27 21:04:36 +01:00
Timothy Flynn
2e3a5b884c LibJS: Implement Unicode aware String.prototype.to{Upper,Lower}Case 2021-07-26 17:03:55 +01:00
davidot
7a56ca1250 LibJS: Implement a naive String.prototype.localeCompare 2021-07-26 15:56:15 +01:00
Timothy Flynn
5a8f870594 LibJS: Implement RegExp.prototype [ @@replace ] with UTF-16 code units
This also converts the GetSubstitution abstract operation take its input
strings as UTF-16 now that all callers are UTF-16 capable. This means
String.prototype.replace (and replaceAll) no longer needs UTF-8 and
UTF-16 copies of these strings.
2021-07-23 23:06:57 +01:00
Timothy Flynn
ee7b04f7bb LibJS: Implement RegExp.prototype [ @@split ] with UTF-16 code units 2021-07-23 23:06:57 +01:00
Timothy Flynn
66c31a0c07 LibJS: Implement RegExp.prototype [ @@search ] with UTF-16 code units 2021-07-23 23:06:57 +01:00
Timothy Flynn
2c023157e9 LibJS: Implement RegExp.prototype [ @@match ] with UTF-16 code units 2021-07-23 23:06:57 +01:00
Timothy Flynn
d3c25593b9 LibJS: Implement String.prototype.split with UTF-16 code units
Also required implementing the SplitMatch abstract operation with UTF-16
code units.
2021-07-22 09:10:44 +02:00
Timothy Flynn
733a92820b LibJS: Implement String.prototype.replaceAll with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
06208aaa15 LibJS: Implement String.prototype.replace with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
bdbe716547 LibJS: Implement String.prototype.endsWith with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
d2e63a641f LibJS: Implement String.prototype.startsWith with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
f920e121b3 LibJS: Implement String.prototype.lastIndexOf with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
5ac964d841 LibJS: Implement String.prototype.slice with UTF-16 code units
This also implements String.prototype.slice more closely to the spec
(such as handling indices equivalent to Infinity).
2021-07-22 09:10:44 +02:00
Timothy Flynn
eaa1360eee LibJS: Implement StringPad abstract operation with UTF-16 code units
Affects String.prototype.padStart and String.prototype.padEnd.
2021-07-22 09:10:44 +02:00
Timothy Flynn
ef2ff5f88b LibJS: Implement String.prototype.at with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
892bfdbbcf LibJS: Implement String.prototype.substr with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
60d8852fc2 LibJS: Implement String.prototype.substring with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
767700d8a1 LibJS: Implement String.prototype.indexOf with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
70f9c7e1c7 LibJS: Implement String.prototype.includes with UTF-16 code units
This also implements String.prototype.includes more closely to the spec
(such as returning false when the search string is a RegExp object).
2021-07-22 09:10:44 +02:00
Timothy Flynn
a05ce330b8 LibJS: Implement String.prototype.codePointAt with UTF-16 code units
This also implements the CodePointAt abstract operation. This is needed
to handle invalid code units specific to the JavaScript spec, rather
than e.g. inserting replacement code units. This abstraction is public
because RegExp.prototype will also need it.
2021-07-22 09:10:44 +02:00
Timothy Flynn
48a28a9a73 LibJS: Implement String.prototype.charCodeAt with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
5d11614bc7 LibJS: Implement String.prototype.charAt with UTF-16 code units 2021-07-22 09:10:44 +02:00
Timothy Flynn
2bba20d123 LibJS: Report string properties using UTF-16 code units
String length is reported as the number of UTF-16 code units, and string
indices are reported as the UTF-16 code units themselves.
2021-07-22 09:10:44 +02:00
Timothy Flynn
0e25d2393f LibJS: Add UTF-16 tests to String.prototype methods that already work
These methods did not require UTF-16 views, so just add test cases to
ensure they remain correct.

This also adds a couple of FIXME comments on tests that will fail even
with UTF-16 String.prototype support (for reasons such as lack of UTF-16
support in RegExp.prototype and Unicode case folding).
2021-07-22 09:10:44 +02:00
Timothy Flynn
a2e734d202 LibJS: Report string length as the code point length, not byte length
For example, U+180E is 3 bytes, but should have a string length of 1.
2021-07-17 16:59:59 +01:00
Timothy Flynn
87848cdf7d AK: Track byte length, rather than code point length, in Utf8View::trim
Utf8View::trim uses Utf8View::substring_view to return its result, which
requires the input to be a byte offset/length rather than code point
length.
2021-07-17 16:59:59 +01:00
Timothy Flynn
5135f4000c LibJS: Implement RegExp.prototype [ @@matchAll ]
This also allows String.prototype.matchAll to work, as all calls to that
method result in an invocation to @@matchAll.
2021-07-16 13:53:11 +01:00
Timothy Flynn
e4124d0218 LibJS: Implement RegExp.prototype [ @@split ] 2021-07-09 19:45:55 +01:00
Timothy Flynn
35a2ba8ed8 LibJS: Implement RegExp.prototype [ @@search ]
String.prototype.search is already implemented, but relies on the well-
known Symbol.search, which was not implemented.
2021-07-08 00:01:20 +01:00
Timothy Flynn
2d0589f93c LibJS: Implement global RegExp.prototype.match
Also rename the 'rx' variable to 'regexp_object' to match other RegExp
methods.
2021-07-08 00:01:20 +01:00
Timothy Flynn
b6b5adb47d LibJS: Implement RegExp.prototype.match with RegExpExec abstraction 2021-07-08 00:01:20 +01:00
Timothy Flynn
ec898a3370 LibJS: Implement RegExp.prototype.replace with RegExpExec abstraction
Also rename the 'rx' variable to 'regexp_object' to match other RegExp
methods.
2021-07-08 00:01:20 +01:00
Timothy Flynn
e0d26fff8c LibJS: Replace strings with the search value coerced to a string
This only causes 1 new test262 test to pass. Other tests that rely on
this coercion fail due to receiving an unexpected value for 'this' when
invoking a functional replacement. For example:

    String/prototype/replaceAll/replaceValue-call-matching-empty.js

Receives 'undefined' for 'this' in the functional replacement invocation
but is expected to receive the global 'this'.
2021-07-06 22:33:17 +01:00
Timothy Flynn
81fec49ac3 LibJS: Evaluate replacement value before searching source string
The String.prototype.replace spec requires evaluating the replacement
value (if it is not a function) before searching the source string.

Fixes 4 test262 tests.
2021-07-06 22:33:17 +01:00
Timothy Flynn
65003241e4 LibRegex: Allow dollar signs in ECMA262 named capture groups
Fixes 1 test262 test.
2021-07-06 22:33:17 +01:00
Timothy Flynn
8fcdc57ae1 LibJS: Coerce named captures to an object before calling GetSubstitution
Per the spec, before invoking the GetSubstitution abstraction, the named
capture groups (if not undefined) should be coerced to an object via the
ToObject abstraction.
2021-07-06 15:07:26 +01:00
Timothy Flynn
424c7eaa40 LibJS: Fix replaceAll crash for overlapping search string positions
The implementation of String.prototype.replaceAll cannot use AK's
implementation of String::find_all when finding the indices of the
search string in the source string. String::find_all will return indices
[0, 1] for String("aaa").find_all("aa") - i.e. it returns overlapping
results. This is not allowed by the JavaScript specification for
replaceAll.
2021-07-06 15:07:26 +01:00