Commit graph

50 commits

Author SHA1 Message Date
Timothy Flynn
f6bee0f5a8 LibJS+LibLocale: Replace number range formatting with ICU
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
67f3de2320 LibJS+LibLocale: Begin replacing number formatting with ICU
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.

Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.

This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
9724a25daf LibJS+LibLocale: Replace canonical locales and display names with ICU
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.

This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).

The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
2024-06-09 10:47:28 +02:00
Andreas Kling
3c74dc9f4d LibJS: Segregate GC-allocated objects by type
This patch adds two macros to declare per-type allocators:

- JS_DECLARE_ALLOCATOR(TypeName)
- JS_DEFINE_ALLOCATOR(TypeName)

When used, they add a type-specific CellAllocator that the Heap will
delegate allocation requests to.

The result of this is that GC objects of the same type always end up
within the same HeapBlock, drastically reducing the ability to perform
type confusion attacks.

It also improves HeapBlock utilization, since each block now has cells
sized exactly to the type used within that block. (Previously we only
had a handful of block sizes available, and most GC allocations ended
up with a large amount of slack in their tails.)

There is a small performance hit from this, but I'm sure we can make
up for it elsewhere.

Note that the old size-based allocators still exist, and we fall back
to them for any type that doesn't have its own CellAllocator.
2023-11-19 12:10:31 +01:00
Timothy Flynn
b3694653a7 LibJS: Stop propagating small OOM errors from Intl.NumberFormat
Note this also does the same for Intl.PluralRules. The only OOM errors
propagated from Intl.PluralRules were from Intl.NumberFormat.
2023-09-05 08:08:09 +02:00
Timothy Flynn
0914e86691 LibLocale+LibJS: Make number format APIs infallible
These APIs only perform small allocations, and are only used by LibJS.
Callers which could only have failed from these APIs are also made to
be infallible here.
2023-08-23 05:29:21 +02:00
Timothy Flynn
b0c8543b28 LibJS: Compute NumberFormat's rounding priority during construction
This is an editorial change in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/c28118e
2023-08-14 07:48:54 -04:00
Timothy Flynn
b411e30024 LibJS: Require a [[RoundingMode]] slot within FormatNumericToString
This was optional to work around a spec issue. That issue was fixed and
brought into LibJS in commit 5b3b14b, but this FIXME was neglected.
2023-04-11 23:22:32 +02:00
Matthew Olsson
7c0c1c8f49 LibJS+LibWeb: Wrap raw JS::Cell*/& fields in GCPtr/NonnullGCPtr 2023-03-15 08:48:49 +01:00
Timothy Flynn
c3abb1396c LibJS+LibWeb: Convert string view PrimitiveString instances to String
First, this adds an overload of PrimitiveString::create for StringView.
This overload will throw an OOM completion if creating a String fails.
This is not only a bit more convenient, but it also ensures at compile
time that all PrimitiveString::create(string_view) invocations will be
handled as String and OOM-aware.

Next, this wraps all invocations to PrimitiveString::create(string_view)
with MUST_OR_THROW_OOM.

A small PrimitiveString::create(DeprecatedFlyString) overload also had
to be added to disambiguate between the StringView and DeprecatedString
overloads.
2023-02-09 17:13:33 +00:00
Timothy Flynn
5e29e04122 LibJS+LibLocale: Propagate errors from find_regional_values_for_locale
This had quite the footprint.
2023-01-27 18:00:17 +00:00
Timothy Flynn
0c2efa285a LibJS+LibLocale: Port Intl.NumberFormat to String 2023-01-24 16:23:50 -05:00
Timothy Flynn
bc9a440f31 LibJS: Use correct type for NumberFormat's UseGrouping internal slot
This was converted to an enumeration for Intl.NumberFormat V3 in commit
33698b9615, but the default value was not
updated (and it's a bit surprising it compiled at all, given that this
is an 'enum class').
2023-01-24 16:23:50 -05:00
Timothy Flynn
1bcde5d216 LibJS: Port ListFormat and PatternPartition to String 2023-01-22 01:03:13 +00:00
Timothy Flynn
bb4b6d8ce3 LibJS: Port Intl locale resolution to String 2023-01-19 20:57:30 +00:00
Timothy Flynn
1e6e719592 LibJS: Propagate OOM errors from the PartitionPattern Abstract Operation 2023-01-19 20:57:30 +00:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Timothy Flynn
43a3471298 LibLocale: Move locale source files to the LibLocale folder
These are still included in LibUnicode, but this updates their location
and the include paths of other files which include them.
2022-09-05 14:37:16 -04:00
Timothy Flynn
ff48220dca Userland: Move files destined for LibLocale to the Locale namespace 2022-09-05 14:37:16 -04:00
Andreas Kling
35c9aa7c05 LibJS: Hide all the constructors!
Now that the GC allocator is able to invoke Cell subclass constructors
directly via friendship, we no longer need to keep them public. :^)
2022-08-29 03:24:54 +02:00
Linus Groh
f9705eb2f4 LibJS: Replace GlobalObject with VM in Intl AOs [Part 1/19]
Instead of passing a GlobalObject everywhere, we will simply pass a VM,
from which we can get everything we need: common names, the current
realm, symbols, arguments, the heap, and a few other things.

In some places we already don't actually need a global object and just
do it for consistency - no more `auto& vm = global_object.vm();`!

This will eventually automatically fix the "wrong realm" issue we have
in some places where we (incorrectly) use the global object from the
allocating object, e.g. in call() / construct() implementations. When
only ever a VM is passed around, this issue can't happen :^)

I've decided to split this change into a series of patches that should
keep each commit down do a somewhat manageable size.
2022-08-23 13:58:30 +01:00
Timothy Flynn
e9e187d15c LibJS: Implement Intl.NumberFormat.prototype.formatRangeToParts 2022-07-20 22:30:16 +01:00
Timothy Flynn
b4a772cde2 LibJS: Implement Intl.NumberFormat.prototype.formatRange 2022-07-20 22:30:16 +01:00
Timothy Flynn
292b8908b5 LibJS: Hook the Intl mathematical value into Intl.NumberFormat 2022-07-20 18:21:24 +01:00
Timothy Flynn
99b79766cd LibJS: Return an enum from ApplyUnsignedRoundingMode
After the Intl MV is implemented, returning a copy of the desired value
here may involve copying non-trivial data. Instead, return an enum to
indicate which decision was made.
2022-07-20 18:21:24 +01:00
Timothy Flynn
8ee485c350 LibJS: Implement Intl.NumberFormat V3's [[RoundingMode]] changes 2022-07-18 23:37:31 +01:00
Timothy Flynn
bb9a44cd50 LibJS: Implement Intl.NumberFormat V3's [[RoundingPriority]] changes 2022-07-18 08:51:07 +01:00
Timothy Flynn
c367bcb5f8 LibJS: Remove accidentally duplicated [[RoundingType]] enumeration
This is defined in NumberFormat's base class.
2022-07-18 08:51:07 +01:00
Timothy Flynn
33698b9615 LibJS+js: Parse new constructor options from Intl.NumberFormat V3
This contains minimal changes to parse newly added and modified options
from the Intl.NumberFormat V3 proposal, while maintaining main spec
behavior in Intl.NumberFormat.prototype.format. The parsed options are
reflected only in Intl.NumberFormat.prototype.resolvedOptions and the js
REPL.
2022-07-13 19:22:26 +01:00
Timothy Flynn
5b68c1a06c LibJS: Use Intl.PluralRules within Intl.NumberFormat
This also allows removing a bit of a BigInt hack to resolve plurality of
BigInt numbers (because the AOs used in ResolvePlural support BigInt,
wherease the naive Unicode::select_pattern_with_plurality did not).

We use cardinal form here; the number format patterns in the CLDR align
with the cardinal form of the plural rules.
2022-07-08 20:33:52 +02:00
Timothy Flynn
2982aa0373 LibJS: Mark the NumberFormat parameter of FormatNumericToString as const
Not critical, but in subsequent commits this will be invoked from a
constant context.
2022-07-08 11:51:54 +02:00
Timothy Flynn
812d3a7ef8 LibJS: Reorganize spec steps for Intl.NumberFormat
This is an editorial change in the Intl spec:
https://github.com/tc39/ecma402/commit/110cb1f
2022-03-15 17:30:58 +01:00
Timothy Flynn
6efbafa6e0 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2022-01-31 18:23:22 +00:00
Timothy Flynn
a0253af8c1 LibJS: Generalize Intl.NumberFormat to operate on Value types
Intl.NumberFormat is meant to format both Number and BigInt types. To
prepare for formatting BigInt types, this generalizes our NumberFormat
implementation to operate on Value instances rather than doubles. All
arithmetic is moved to static helpers that can now be updated with
BigInt semantics.
2022-01-30 20:05:27 +00:00
Timothy Flynn
ac3e42a8de LibJS: Move some Intl.NumberFormat fields into a NumberFormatBase class
Other Intl objects, such as PluralRules, are to be treated as a
NumberFormat object in some AOs. There's only a handful of fields which
are to be shared between those objects - move them to a base class for
shared reuse.

This also updates the couple of NumberFormat AOs that are meant to
operate on these NumberFormat-like objects.

Alternatively, we could just have objects like PluralRules inherit from
NumberFormat directly. But that messes up the is<NumberFormat> runtime
checks, so this feels safer.
2022-01-28 19:38:47 +00:00
Timothy Flynn
0865f71d37 LibJS: Convert Intl.NumberFormat to use Unicode::Style 2022-01-25 19:02:59 +00:00
mjz19910
10ec98dd38 Everywhere: Fix spelling mistakes 2022-01-07 15:44:42 +01:00
Timothy Flynn
d2588d852b LibJS: Change all [[RelevantExtensionKeys]] to return constexpr arrays
There's no need to allocate a vector for this internal slot. Similar to
commit: bb11437792
2021-12-01 16:36:26 +00:00
Timothy Flynn
914675e826 LibJS+LibUnicode: Separate number formatting methods from Locale.h
Currently, we generate separate data files for locale and number format
related tables/methods, but provide public accessors for all of the data
in one Locale.h file. Rather than continuing this trend for date-time,
relative time, etc. formatting, it's a bit easier to reason about if the
public accessors are also in separate files.
2021-11-29 22:48:46 +00:00
Timothy Flynn
a1d5849e67 LibJS: Implement unit number formatting 2021-11-16 23:14:09 +00:00
Timothy Flynn
80b86d20dc LibJS: Cache the number format used for compact notation
Finding the best number format to use for compact notation involves
creating a Vector of all compact formats for the locale and looking for
the one that best matches the number's magnitude. ECMA-402 wants this
number format to be found multiple times, so cache the result for future
use.
2021-11-16 00:56:55 +00:00
Timothy Flynn
4d79ab6866 LibJS: Implement engineering and scientific number formatting 2021-11-14 17:00:35 +00:00
Timothy Flynn
3450def494 LibJS: Implement Intl.NumberFormat.prototype.formatToParts 2021-11-13 19:01:25 +00:00
Timothy Flynn
c65dea64bd LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern
In order to implement Intl.NumberFormat.prototype.formatToParts, do not
replace {currency} keys in the format pattern before ECMA-402 tells us
to. Otherwise, the array return by formatToParts will not contain the
expected currency key.

Early replacement was done to avoid resolving the currency display more
than once, as it involves a couple of round trips to search through
LibUnicode data. So this adds a non-standard method to NumberFormat to
do this resolution and cache the result.

Another side effect of this change is that LibUnicode must replace unit
format patterns of the form "{0} {1}" during code generation. These were
previously skipped during code generation because LibJS would just
replace the keys with the currency display at runtime. But now that the
currency display injection is delayed, any {0} or {1} keys in the format
pattern will cause PartitionNumberPattern to abort.
2021-11-13 19:01:25 +00:00
Timothy Flynn
a701ed52fc LibJS+LibUnicode: Fully implement currency number formatting
Currencies are a bit strange; the layout of currency data in the CLDR is
not particularly compatible with what ECMA-402 expects. For example, the
currency format in the "en" and "ar" locales for the Latin script are:

    en: "¤#,##0.00"
    ar: "¤\u00A0#,##0.00"

Note how the "ar" locale has a non-breaking space after the currency
symbol (¤), but "en" does not. This does not mean that this space will
appear in the "ar"-formatted string, nor does it mean that a space won't
appear in the "en"-formatted string. This is a runtime decision based on
the currency display chosen by the user ("$" vs. "USD" vs. "US dollar")
and other rules in the Unicode TR-35 spec.

ECMA-402 shies away from the nuances here with "implementation-defined"
steps. LibUnicode will store the data parsed from the CLDR however it is
presented; making decisions about spacing, etc. will occur at runtime
based on user input.
2021-11-13 11:52:45 +00:00
Timothy Flynn
89523f70cf LibJS: Begin implementing Intl.NumberFormat.prototype.format
There is quite a lot to be done here so this is just a first pass at
number formatting. Decimal and percent formatting are mostly working,
but only for standard and compact notation (engineering and scientific
notation are not implemented here). Currency formatting is parsed, but
there is more work to be done to handle e.g. using symbols instead of
currency codes ("$" instead of "USD"), and putting spaces around the
currency symbol ("USD 2.00" instead of "USD2.00").
2021-11-12 09:17:08 +00:00
Idan Horowitz
768009e005 LibJS: Convert NumberFormat AOs to ThrowCompletionOr 2021-09-18 22:59:15 +03:00
Timothy Flynn
7769cd2cab LibJS: Move number_format_relevant_extension_keys to Intl.NumberFormat
This method represents the Intl.NumberFormat's [[RelevantExtensionKeys]]
internal slot, so it makes more sense for this to be directly in the
class itself.
2021-09-12 12:57:17 +01:00
Timothy Flynn
94a5a0437c LibJS: Move Intl.NumberFormat's AOs to its object file 2021-09-12 12:57:17 +01:00
Timothy Flynn
07f12b108b LibJS: Implement a nearly empty Intl.NumberFormat object
This adds plumbing for the Intl.NumberFormat object, constructor, and
prototype.
2021-09-11 11:05:50 +01:00