Commit graph

122 commits

Author SHA1 Message Date
Timothy Flynn
f6bee0f5a8 LibJS+LibLocale: Replace number range formatting with ICU
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
67f3de2320 LibJS+LibLocale: Begin replacing number formatting with ICU
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.

Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.

This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
9724a25daf LibJS+LibLocale: Replace canonical locales and display names with ICU
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.

This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).

The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
2024-06-09 10:47:28 +02:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Andreas Kling
3c74dc9f4d LibJS: Segregate GC-allocated objects by type
This patch adds two macros to declare per-type allocators:

- JS_DECLARE_ALLOCATOR(TypeName)
- JS_DEFINE_ALLOCATOR(TypeName)

When used, they add a type-specific CellAllocator that the Heap will
delegate allocation requests to.

The result of this is that GC objects of the same type always end up
within the same HeapBlock, drastically reducing the ability to perform
type confusion attacks.

It also improves HeapBlock utilization, since each block now has cells
sized exactly to the type used within that block. (Previously we only
had a handful of block sizes available, and most GC allocations ended
up with a large amount of slack in their tails.)

There is a small performance hit from this, but I'm sure we can make
up for it elsewhere.

Note that the old size-based allocators still exist, and we fall back
to them for any type that doesn't have its own CellAllocator.
2023-11-19 12:10:31 +01:00
Andreas Kling
65717e3b75 LibJS: Inline fast case for Value::to_{boolean,number,numeric,primitive}
These functions all have a very common case that can be dealt with a
very simple inline check, often avoiding the need to call an out-of-line
function. This patch moves the common case to inline functions in a new
ValueInlines.h header (necessary due to header dependency issues..)

8% speed-up on the entire Kraken benchmark :^)
2023-10-07 07:13:52 +02:00
Timothy Flynn
b3694653a7 LibJS: Stop propagating small OOM errors from Intl.NumberFormat
Note this also does the same for Intl.PluralRules. The only OOM errors
propagated from Intl.PluralRules were from Intl.NumberFormat.
2023-09-05 08:08:09 +02:00
Timothy Flynn
30a812b77b LibJS: Stop propagating small OOM errors from Intl.MathematicalValue 2023-09-05 08:08:09 +02:00
Timothy Flynn
b6ff25bd26 LibJS: Stop propagating small OOM errors from Intl abstract operations 2023-09-05 08:08:09 +02:00
Timothy Flynn
ca0d926036 LibJS: Use decimal compact patterns for currency style sub-patterns
When formatting a currency style pattern with compact notation, we were
(trying to) doubly insert the currency symbol into the formatted string.
We would first look up the currency pattern in GetNumberFormatPattern
(for the en locale, this is "¤#,##0.00", which our generator transforms
to "{currency}{number}").

When we hit the "{number}" field, NumberFormat will do a second lookup
for the compact pattern to use for the number being formatted. By using
the currency compact patterns, we receive a second pattern that also has
the currency symbol (for the en locale, if formatting the number 1000,
this is "¤0K", which our generator transforms to
"{currency}{number}{compactIdentifier:0}". This second lookup is not
supposed to have currency symbols (or any other symbols), thus we hit a
VERIFY_NOT_REACHED().

Instead, we are meant to use the decimal compact pattern, and allow the
currency symbol to be handled by only the outer currency pattern.
2023-09-04 18:22:28 +02:00
Timothy Flynn
0914e86691 LibLocale+LibJS: Make number format APIs infallible
These APIs only perform small allocations, and are only used by LibJS.
Callers which could only have failed from these APIs are also made to
be infallible here.
2023-08-23 05:29:21 +02:00
Timothy Flynn
cd526813e6 LibLocale+LibJS: Make locale data APIs infallible
These APIs only perform small allocations, and are only used by LibJS.
Callers which could only have failed from these APIs are also made to
be infallible here.
2023-08-23 05:29:21 +02:00
Timothy Flynn
b0c8543b28 LibJS: Compute NumberFormat's rounding priority during construction
This is an editorial change in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/c28118e
2023-08-14 07:48:54 -04:00
Andreas Kling
c084269e5f LibJS: Make PrimitiveString::utf8_string() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
1a27c525d5 LibJS: Make PrimitiveString::create() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Lucas CHOLLET
3f35ffb648 Userland: Prefer _string over _short_string
As `_string` can't fail anymore (since 3434412), there are no real
benefits to use the short variant in most cases.
2023-08-08 07:37:21 +02:00
Timothy Flynn
7e0083fb65 LibJS: Rename ErrorType::IntlNumberIsNaN to ErrorType::NumberIsNaN
It will be used outside of the Intl namespace, so give it a less overly
specific name.
2023-06-26 10:39:07 +02:00
Timothy Flynn
f816a24b86 LibJS: Update spec numbers for the Intl NumberFormat v3 proposal
This proposal has been merged into the main ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/4257160

Note this includes some editorial and normative changes made when the
proposal was merged into the main spec, but are not in the proposal spec
itself. In particular, the following AOs were changed:

    PartitionNumberRangePattern (normative)
    SetNumberFormatDigitOptions (editorial)
2023-04-11 23:22:32 +02:00
Timothy Flynn
b411e30024 LibJS: Require a [[RoundingMode]] slot within FormatNumericToString
This was optional to work around a spec issue. That issue was fixed and
brought into LibJS in commit 5b3b14b, but this FIXME was neglected.
2023-04-11 23:22:32 +02:00
Linus Groh
09d40bfbb2 Everywhere: Use _{short_,}string to create Strings from literals 2023-02-25 20:51:49 +01:00
Timothy Flynn
b4113536ef LibJS: Use substrings-with-superstrings in Intl.NumberFormat's grouping
To add grouping to a number, we take a string such as "123456.123" and
break it into integer and fraction parts. Then we take the integer part
and break it into locale-specific sized groups to inject the locale's
group separator (e.g. a comma in en-US). We currently create new strings
for each of these groups. Instead, we can use the shared superstring
method to avoid all of that string copying.
2023-02-18 20:00:15 +01:00
Timothy Flynn
c3abb1396c LibJS+LibWeb: Convert string view PrimitiveString instances to String
First, this adds an overload of PrimitiveString::create for StringView.
This overload will throw an OOM completion if creating a String fails.
This is not only a bit more convenient, but it also ensures at compile
time that all PrimitiveString::create(string_view) invocations will be
handled as String and OOM-aware.

Next, this wraps all invocations to PrimitiveString::create(string_view)
with MUST_OR_THROW_OOM.

A small PrimitiveString::create(DeprecatedFlyString) overload also had
to be added to disambiguate between the StringView and DeprecatedString
overloads.
2023-02-09 17:13:33 +00:00
Timothy Flynn
89da8de4ca LibJS+LibLocale: Propagate OOM from CLDR NumberFormat Vector operations 2023-02-08 18:32:37 +00:00
Timothy Flynn
822ee35f7a LibJS: Propagate OOM from Intl.NumberFormat Vector operations 2023-02-08 18:32:37 +00:00
Timothy Flynn
ea13f3e285 LibJS: Propagate OOM from PatternPartitionWithSource factory 2023-02-08 18:32:37 +00:00
Timothy Flynn
e74e8381d5 LibJS: Allow "approximately" results to differ in plural form
This is a normative change in the Intl.NumberFormat V3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/08f599b

Note that this didn't seem to actually affect our implementation. The
Unicode spec states:

https://www.unicode.org/reports/tr35/tr35-53/tr35-numbers.html#Plural_Ranges
"If there is no value for a <start,end> pair, the default result is end"

Therefore, our implementation did not have the behavior noted by the
issue this normative change addressed:

    const pr = new Intl.PluralRules("en-US");
    pr.selectRange(1, 1); // Is "other", should be "one"

Our implementation already returned "one" here because there is no such
<start=one, end=one> value in the CLDR for en-US. Thus, we already
returned the end value of "one".
2023-01-30 14:10:07 -05:00
Timothy Flynn
6a50fb465c LibJS: Make use of the Intl MV in more Intl.NumberFormat AOs
This is an editorial change in the Intl.NumberFormat V3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/c24b33e

Note our implementation was already using the Intl MV in these AOs just
due to C++ type safety.
2023-01-30 12:19:14 -05:00
Timothy Flynn
4475f21e9e LibJS: Allow locale approximately signs to be empty in Intl.NumberFormat
This is a normative change in the Intl.NumberFormat V3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/23e69cf

This isn't particularly testable because every locale in the CLDR has a
non-empty "approximatelySign" field in cldr-numbers-modern. The issue
for this change seems to be considering the "miscPatterns/approximately"
field instead, which has different semantics. But as noted on the CLDR
issue https://unicode-org.atlassian.net/browse/CLDR-14918, the ICU uses
the "approximatelySign" field (as do our implementation).
2023-01-30 12:19:14 -05:00
Timothy Flynn
a824e1ac6a LibJS: Remove last use of DeprecatedString from Intl.MathematicalValue 2023-01-28 00:13:59 +00:00
Timothy Flynn
5e29e04122 LibJS+LibLocale: Propagate errors from find_regional_values_for_locale
This had quite the footprint.
2023-01-27 18:00:17 +00:00
Timothy Flynn
0c2efa285a LibJS+LibLocale: Port Intl.NumberFormat to String 2023-01-24 16:23:50 -05:00
Timothy Flynn
1bcde5d216 LibJS: Port ListFormat and PatternPartition to String 2023-01-22 01:03:13 +00:00
Timothy Flynn
95d1678553 LibJS: Mark infallible operations that may throw only due to OOM 2023-01-20 20:31:38 +00:00
Timothy Flynn
1e6e719592 LibJS: Propagate OOM errors from the PartitionPattern Abstract Operation 2023-01-19 20:57:30 +00:00
Timothy Flynn
0ff4d8100f LibJS: Consistently use spaces / parentheses in NumberFormat operations
These are editorial changes in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/1508825
https://github.com/tc39/ecma402/commit/760f23a
2023-01-14 19:12:48 +00:00
Timothy Flynn
0ffad2a2d1 LibJS: Refer to String elements as code units rather than characters
This is an editorial change in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/d6b3435
2023-01-14 19:12:48 +00:00
Timothy Flynn
2cca5d6676 LibJS: Fix assignment of "isNegative" in FormatNumericToString
These are normative changes in the Intl.NumberFormat v3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/5a2b1d1
https://github.com/tc39/proposal-intl-numberformat-v3/commit/cd48a3d
2023-01-14 19:12:48 +00:00
Timothy Flynn
d30e96a209 LibJS: Renumber Intl.NumberFormat v3 prototypes and AOs
These are editorial changes in the Intl.NumberFormat v3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/82e2f92
https://github.com/tc39/proposal-intl-numberformat-v3/commit/ce6c33e
https://github.com/tc39/proposal-intl-numberformat-v3/commit/b982783
https://github.com/tc39/proposal-intl-numberformat-v3/commit/96010f4
https://github.com/tc39/proposal-intl-numberformat-v3/commit/9dd123f
https://github.com/tc39/proposal-intl-numberformat-v3/commit/0c2834f
https://github.com/tc39/proposal-intl-numberformat-v3/commit/31c72f3
2023-01-14 19:12:48 +00:00
Timothy Flynn
d1881da2be LibJS: Set approximate number range format result's "source" to "shared"
This is a normative change in the Intl.NumberFormat v3 spec. See:
https://github.com/tc39/proposal-intl-numberformat-v3/commit/7510e7f
2023-01-14 19:12:48 +00:00
Timothy Flynn
a59ebdac2d LibJS+Everywhere: Return strings by value from PrimitiveString
It turns out return a ThrowCompletionOr<T const&> is flawed, as the GCC
expansion trick used with TRY will always make a copy. PrimitiveString
is luckily the only such use case.
2023-01-13 18:50:47 -05:00
Timothy Flynn
115baa7e32 LibJS+Everywhere: Make PrimitiveString and Utf16String fallible
This makes construction of Utf16String fallible in OOM conditions. The
immediate impact is that PrimitiveString must then be fallible as well,
as it may either transcode UTF-8 to UTF-16, or create a UTF-16 string
from ropes.

There are a couple of places where it is very non-trivial to propagate
the error further. A FIXME has been added to those locations.
2023-01-08 12:13:15 +01:00
Timothy Flynn
2dfa87814e LibJS: Update spec comments for replacing digits in Intl.NumberFormat
This is an editorial change in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/06d95ed

Note the new spec steps basically match our implementation in LibLocale.
2022-12-15 16:24:29 +00:00
Andreas Kling
4abdb68655 LibJS: Remove Object(Object& prototype) footgun
This constructor was easily confused with a copy constructor, and it was
possible to accidentally copy-construct Objects in at least one way that
we dicovered (via generic ThrowCompletionOr construction).

This patch adds a mandatory ConstructWithPrototypeTag parameter to the
constructor to disambiguate it.
2022-12-14 15:11:57 +01:00
Linus Groh
ddc6e139a6 LibJS: Convert Object::create() to NonnullGCPtr 2022-12-14 09:59:45 +00:00
Linus Groh
91b0123eaf LibJS: Convert Array::create{,_from}() to NonnullGCPtr 2022-12-14 09:59:45 +00:00
Linus Groh
525f22d018 LibJS: Replace standalone js_string() with PrimitiveString::create()
Note that js_rope_string() has been folded into this, the old name was
misleading - it would not always create a rope string, only if both
sides are not empty strings. Use a three-argument create() overload
instead.
2022-12-07 16:43:06 +00:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Timothy Flynn
d56205f991 LibJS: Use more accurate number-to-string method in Intl.NumberFormat
Intl.NumberFormat only ever wants literal number-to-digits here, without
extra exponential formatting.
2022-11-04 21:12:10 +00:00
Timothy Flynn
ff48220dca Userland: Move files destined for LibLocale to the Locale namespace 2022-09-05 14:37:16 -04:00