ICU 72 began using non-ASCII spaces in some formatted date-time strings.
Every major browser has found that this introduced major breakage in web
compatibility, as many sites and tools expect ASCII spaces. This patch
removes these non-ASCII spaces in the same manner as the major engines.
Such behavior is also tested by WPT.
The typeof operator has a very small set of possible resulting strings,
so let's make it much faster by caching those strings on the VM.
~8x speed-up on this microbenchmark:
for (let i = 0; i < 10_000_000; ++i) {
typeof i;
}
For performance, rather than slowly incrementing the capacity of the
rope string's buffer, compute an approximate length for that buffer to
be reserved up front.
If no header includes the prototype of a function, then it cannot be
used from outside the translation unit it was defined in. In that case,
it should be marked as `static`, in order to avoid possible ODR
problems, unnecessary exported symbols, and allow the compiler to better
optimize those.
If this warning triggers in a function defined in a header, `inline`
needs to be added, otherwise if the header is included in more than one
TU, it will fail to link with a duplicate definition error.
The reason this diff got so big is that Lagom-only code wasn't built
with this flag even in Serenity times.
The JS::Error types all store their exception messages as a String. So
by using ByteString, we hit the StringView constructor, and end up
allocating the same string twice.
Implement for CreatePerIterationEnvironment for 'for' loops per the Ecma
Standard. This ensures each iteration of a 'for' loop has its own
lexical environment so that variables declared in the loop are scoped to
the current iteration.
On PowerPC 64 pointers can use all 64 bits, however by convention on
Linux user-space addresses use only the lower 43 bits.
I'm not 100% certain that the masking off of the 16 high bits is the
proper solution, but it matches the rest of the LibJS code which assumes
pointers only use the lower 48 bits.
https://www.kernel.org/doc/ols/2001/ppc64.pdf
These were just here to create a nicer error message for debugging.
A release build will still crash in the exact same place, but now you'll
need to get a backtrace the normal way instead.
We now defer looking up the various identifiers by IdentifierTableIndex
until the last moment. This allows us to avoid the retrieval in common
cases like when a property access is cached.
Knocks a ~12% item off the profile on https://ventrella.com/Clusters/
This allows global `let` and `const` variable accesses to be cached
by the GetGlobal instruction, and works even when the access is in a
different translation unit from the declaration.
Knocks a ~10% item off the profile on https://ventrella.com/Clusters/
When compiling with `-O2 -g1` optimizations (as done in the main
Serenity build), no out-of-line definitions end up emitted for
`Value::to_numeric`, causing files that reference the function but don't
include the definition from `ValueInlines.h` to add an undefined
reference in LibJS.so.
(cherry picked from commit 85b7ce8c2f6daf0db80e801d7fb2503d070765ce)
The `deepEquals` algorithm used for testing was naive, and incorrectly
evaluated equality of objects in some cases. The new algorithm considers
that the second object could have more keys than the first, and compares
the prototypes of the objects.
The changes to tests are due to LibTimeZone incorrectly interpreting
time stamps in the TZDB. The TZDB will list zone transitions in either
UTC or the zone's local time (which is then subject to DST offsets).
LibTimeZone did not handle the latter at all.
For example:
The following rule is in effect until November 18, 6PM UTC.
America/Chicago -5:50:36 - LMT 1883 Nov 18 18:00u
The following rule is in effect until March 1, 2AM in Chicago time. But
at that time, a DST transition occurs, so the local time is actually
3AM.
America/Chicago -6:00 Chicago C%sT 1936 Mar 1 2:00
This required updating some LibJS spec steps to their latest versions,
as the data expected by the old steps does not quite match the APIs that
are available with the ICU. The new spec steps are much more aligned.
LibLocale was split off from LibUnicode a couple years ago to reduce the
number of applications on SerenityOS that depend on CLDR data. Now that
we use ICU, both LibUnicode and LibLocale are actually linking in this
data. And since vcpkg gives us static libraries, both libraries are over
30MB in size.
This patch reverts the separation and merges LibLocale into LibUnicode
again. We now have just one library that includes the ICU data.
Further, this will let LibUnicode share the locale cache that previously
would only exist in LibLocale.
When resuming execution of a suspended function, we must not overwrite
any cached `this` value with something from the ExecutionContext.
This was causing an empty JS::Value to leak into the VM when resuming
an async arrow function, since the "this mode" for such functions is
lexical and thus ExecutionContext will have an empty `this`.
It became a problem due to the bytecode optimization where we allow
ourselves to assume that `this` remains cached after we've executed a
ResolveThisBinding in this (or the first) basic block of the executable.
Fixes https://github.com/LadybirdBrowser/ladybird/issues/138
There have been a number of changes to the locale resolution AOs that
we've fallen behind on. Mostly editorial, but includes one normative
change to canonicalize Unicode extension keywords in the Intl.Locale
constructor.
Instead of taking an out-parameter, return the canonicalization result.
This allows the API to be used where specs want to store the result and
the original values in separate variables.
ListFormat was the first formatter I ported to ICU. This patch makes it
match the style of subsequently ported formatters, where we create the
formatter once per Intl object, rather than once per prototype
invocation.
We were previously treating undefined and null as the same (an empty
Optional). However, there are edge cases in ECMA-402 where we must treat
them differently. Namely, the hour cycle (hc) keyword. An undefined hc
value has no effect on the resolved locale, whereas a null hc value can
actively override any hc specified in the locale string. For example:
new Intl.DateTimeFormat("en-u-hc-h11", { hour12: false });
In that object, the hour12 option does not match the u-hc-h11 value. So
the spec dictates we remove the hc value by setting it to null.
This uses ICU for all of the Intl.PluralRules prototypes, which lets us
remove all data from our plural rules generator.
Plural rules depend directly on internal data from the number formatter,
so rather than creating a separate Locale::PluralRules class (which will
make accessing that data awkward), this adds plural rules APIs to the
existing Locale::NumberFormat.
This is actually safe everywhere but in the topmost program scope.
For web compatibility reasons, we have to flush all top-level bindings
to the environment, in case a subsequent separate <script> program
comes looking for them.
Instead of displaying locals as "locN", we now show them as "name~N".
This makes it a lot easier to follow bytecode dumps, especially in
longer functions.
Note that we keep displaying the local index, to avoid confusion in case
there are multiple separate locals with the same name in one executable.
Linking LibLocale publicly ensures that libicudata.a is also available
in all embedders of LibJS. Otherwise, ICU crashes in hard-to-track-down
ways at runtime when the data is not available.
The proposal has undergone quite a few normative changes since we last
synced with it. There was a time when it could not be implemented as it
was written, which is no longer the case. The resulting proposal has had
so many changes compared to our implementation, that it wouldn't make
sense to implement them commit-by-commit as we normally do. So instead,
this just implements the HEAD revision of the spec in one pass.
This uses ICU for the Intl.DateTimeFormat `format` `formatToParts`,
`formatRange`, and `formatRangeToParts`.
This lets us remove most data from our date-time format generator. All
that remains are time zone data and locale week info, which are relied
upon still for other interfaces. So they will be removed in a future
patch.
Note: All of the changes to the test files in this patch are now aligned
with other browsers. This includes:
* Some very incorrect formatting of Japanese symbols. (Looking at the
old results now, it's very obvious they were wrong.)
* Old FIXMEs regarding range formatting not including the start/end date
when only time fields were requested, but the dates differ.
* Day period inconsistencies.
Properties marked with the [[Unimplemented]] attribute behave as normal
but invoke the `VM::on_unimplemented_property_access callback` when
they are accessed.
The IntlMV is meant to be arbitrarily precise. If the user provides a
string value to be formatted, we lose precision by converting extremely
large values to a double. We were never able to address this, as support
for arbitrary precision was a big FIXME. But ICU can handle it by just
passing the raw string on through.
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.
Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.
This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.
This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).
The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
NetBSD and FreeBSD get upset when we don't set the fd to an invalid
value when using a non-shared mapping.
Reported-By: Thomas Klausner <wiz@gatalith.at>
Instead of scanning through the list of seen constants, we now have a
more structured storage of the constants true, false, null, undefined,
and every possible Int32 value.
This fixes an O(n^2) issue found by Kraken/json-stringify-tinderbox.js
This turns expressions like `(2 + 3) * 8 / 2` into a constant (20)
at bytecode compilation time instead of generating instructions
to calculate the value.
This is a new Bytecode::Generator helper that takes an operand and
returns the same operand, or a copy of it, in case a copy is required
to preserve correct evaluation order.
This can be used in a bunch of places where we're worried about
clobbering some value after obtaining it.
Practically, locals are always copied, and temporary registers as well
as constants are returned as-is.
Functions that don't have a FunctionEnvironment will get their `this`
value from the ExecutionContext. This patch stops generating
ResolveThisBinding instructions at all for functions like that, and
instead pre-populates the `this` register when entering a new bytecode
executable.
We already have a dedicated register slot for `this`, so instead of
having ResolveThisBinding take a `dst` operand, just write the value
directly into the `this` register every time.
...that maintains a list of allocated execution contexts so malloc()
does not have to be called every time we need to get a new one.
20% improvement in Octane/typescript.js
16% improvement in Kraken/imaging-darkroom.js
By using separate struct we can avoid updating AST node and
ECMAScriptFunctionObject constructors every time there is a need to
add or remove some additional information colllected during parsing.
Allows to skip function environment allocation for non-arrow functions
if the only reason it is needed is to hold `this` binding.
The parser is changed to do following:
- If a function is an arrow function and uses `this` then all functions
in a scope chain are marked to allocate function environment for
`this` binding.
- If a function uses `new.target` then all functions in a scope chain
are marked to allocate function environment.
`ordinary_call_bind_this()` is changed to put `this` value in execution
context when function environment allocation is skipped.
35% improvement in Octane/typescript.js
50% improvement in Octane/deltablue.js
19% improvement in Octane/raytrace.js
This allows us to skip allocating a function environment in cases where
it was previously impossible because the arguments object needed a
binding.
This change does not bring visible improvement in Kraken or Octane
benchmarks but seems useful to have anyway.
By doing this, we can remove all the special-case checks for `length`
from the generic GetById and GetByIdWithThis code paths, making every
other property lookup a bit faster.
Small progressions on most benchmarks, and some larger progressions:
- 12% on Octane/crypto.js
- 6% on Kraken/ai-astar.js
The VERIFY assumed that we either have a finally block which we already
visited through a previous boundary or have no finally block what so
ever; But since 301a1fc763 we propagate
finalizers through unwind scopes breaking that assumption.
It wasn't safe to use addition_would_overflow(a, -b) to check if
subtraction (a - b) would overflow, since it doesn't cover this case.
I don't know why we didn't have subtraction_would_overflow(), so this
patch adds it. :^)
With this only `ContinuePendingUnwind` needs to dynamically check if a
scheduled return needs to go through a `finally` block, making the
interpreter loop a bit nicer