When appending two strings together to form a new string, if both of the
strings are already UTF-16, create the new string as UTF-16 as well.
This shaves about 0.5 seconds off the following test262 test:
RegExp/property-escapes/generated/General_Category_-_Decimal_Number.js
This commit does not go out of its way to reduce copying of the string
data yet, but is a minimum set of changes to compile LibJS after making
PrimitiveString hold a Utf16String.
PrimitiveString may currently only be created with a UTF-8 string, and
it transcodes on the fly when a UTF-16 string is needed. Allow creating
a PrimitiveString from a UTF-16 string to avoid unnecessary transcoding
when the caller only wants UTF-16.
LibJS parses JavaScript as UTF-8, so when creating a string, we must
transcode it to UTF-16 to handle encoded surrogate pairs.
For example, consider the following string:
"\ud83d\ude00"
The UTF-8 encoding of this surrogate pair is:
0xf0 0x9f 0x98 0x80
However, LibJS will currently store the two surrogates individually as
UTF-8 encoded bytes, rather than combining the pair:
0xed 0xa0 0xb8, 0xed 0xb8 0x80
These are not equivalent. So, as String.prototype becomes UTF-16 aware,
this encoding will no longer work for abstractions like strict equality.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *