As long as the inputs are Int32, we can convert them to UInt32 in a
spec-compliant way with a simple static_cast<u32>.
This allows calculations like `-3 >>> 2` to take the fast path as well,
which is extremely valuable for stuff like crypto code.
While we're doing this, also remove the fast paths from the generic
shift functions in Value.cpp, since we only end up there if we *didn't*
take the same fast path in the interpreter.
Since get() returns an empty Optional if the index is not present, we
can combine these two into a single get() operation and save the cost of
a virtual call.
This patch adds a new "Peephole" pass for performing small, local
optimizations to bytecode.
We also introduce the first such optimization, fusing a sequence of
some comparison instruction FooCompare followed by a JumpIf into a
new set of JumpFooCompare instructions.
This gives a ~50% speed-up on the following microbenchmark:
for (let i = 0; i < 10_000_000; ++i) {
}
But more traditional benchmarks see a pretty sizable speed-up as well,
for example 15% on Kraken/ai-astar.js and 16% on Kraken/audio-dft.js :^)
The entry block must stay in place, although it's okay to merge stuff
into it.
This fixes 4 test262 tests and brings us to parity with optimization
disabled. :^)
Instead of emitting a NewBigInt instruction to construct a primitive
bigint from a parsed literal, we now instantiate the BigInt on the heap
during codegen.
Instead of emitting a NewString instruction to construct a primitive
string from a parsed literal, we now instantiate the PrimitiveString on
the heap during codegen.
`var` declarations can have duplicates, but duplicate `let` or `const`
bindings are a syntax error.
Because of this, we can sink `let` and `const` directly into the
preferred_dst if available. This is not safe for `var` since the
preferred_dst may be used in the initializer.
This patch fixes the issue by simply skipping the preferred_dst
optimization for `var` declarations.
Instead of looking these up in the VM execution context stack whenever
we need them, we now just cache them in the interpreter when entering
a new call frame.
Comparing two Values has to call the generic same_value() helper,
and we can avoid this by simply using a stronger type for built-in
native function handlers.
The exisiting fast path only permits for valid i32 values.
On https://cyxx.github.io/another_js, this eliminates the runtime of
typed_array_set_element, and reduces the runtime of put_by_value from
11.1% to 7.7%.
When compiling code like this:
x = { foo: x }
We don't want to put a new JS::Object in `x` until *after* we've
evaluated `x` for the `foo` field.
This fixes an issue when loading https://puter.com/ :^)
Instead of having Call refer to a range of VM registers, it now has
a trailing list of argument operands as part of the instruction.
This means we no longer have to shuffle every argument value into
a register before making a call, making bytecode smaller & faster. :^)
By handling common cases like Int32 arithmetic directly in the
instruction handler, we can avoid the cost of calling the generic helper
functions in Value.cpp.
Instead of splitting the postfix variants into ToNumeric + Inc/Dec,
we now have dedicated PostfixIncrement and PostfixDecrement instructions
that handle both outputs in one go.
This patch moves us away from the accumulator-based bytecode format to
one with explicit source and destination registers.
The new format has multiple benefits:
- ~25% faster on the Kraken and Octane benchmarks :^)
- Fewer instructions to accomplish the same thing
- Much easier for humans to read(!)
Because this change requires a fundamental shift in how bytecode is
generated, it is quite comprehensive.
Main implementation mechanism: generate_bytecode() virtual function now
takes an optional "preferred dst" operand, which allows callers to
communicate when they have an operand that would be optimal for the
result to go into. It also returns an optional "actual dst" operand,
which is where the completion value (if any) of the AST node is stored
after the node has "executed".
One thing of note that's new: because instructions can now take locals
as operands, this means we got rid of the GetLocal instruction.
A side-effect of that is we have to think about the temporal deadzone
(TDZ) a bit differently for locals (GetLocal would previously check
for empty values and interpret that as a TDZ access and throw).
We now insert special ThrowIfTDZ instructions in places where a local
access may be in the TDZ, to maintain the correct behavior.
There are a number of progressions and regressions from this test:
A number of async generator tests have been accidentally fixed while
converting the implementation to the new bytecode format. It didn't
seem useful to preserve bugs in the original code when converting it.
Some "does eval() return the correct completion value" tests have
regressed, in particular ones related to propagating the appropriate
completion after control flow statements like continue and break.
These are all fairly obscure issues, and I believe we can continue
working on them separately.
The net test262 result is a progression though. :^)
This is pure prep work for refactoring the bytecode to use more operands
instead of only registers.
generate_bytecode() virtuals now return an Optional<Operand>, and the
idea is to return an Operand referring to the value produced by this
AST node.
They also take an Optional<Operand> "preferred_dst" input. This is
intended to communicate the caller's preference for an output operand,
if any. This will be used to elide temporaries when we can store the
result directly in a local, for example.
The JIT compiler was an interesting experiment, but ultimately the
security & complexity cost of doing arbitrary code generation at runtime
is far too high.
In subsequent commits, the bytecode format will change drastically, and
instead of rewriting the JIT to fit the new bytecode, this patch simply
removes the JIT instead.
Other engines, JavaScriptCore in particular, have already proven that
it's possible to handle the vast majority of contemporary web content
with an interpreter. They are currently ~5x faster than us on benchmarks
when running without a JIT. We need to catch up to them before
considering performance techniques with a heavy security cost.
Previously, attempting to load a value from an invalid reference would
cause a crash. We now return a CodeGenerationError rather than hitting
an assertion. This is not a complete solution, as ideally we would want
to return a ReferenceError, but this now matches the behavior we see
when we attempt to store something to an invalid reference.
perform_call() wants a ReadonlySpan<Value>, so just grab a slice of the
current register window instead of making a MarkedVector.
10% speed-up on this function call microbenchmark:
function callee(a, b, c) { }
function caller(callee) {
for (let i = 0; i < 10_000_000; ++i)
callee(1, 2, 3)
}
caller(callee)
Previously, constructing a `UnsignedBigInteger::from_base()` could
produce an incorrect result if the input string contained a valid
Base36 digit that was out of range of the given base. The same method
would also crash if the input string contained an invalid Base36 digit.
An error is now returned in both these cases.
Constructing a BigFraction from string is now also fallible, so that we
can handle the case where we are given an input string with invalid
digits.
The IsValidIntegerIndex AO performs the checks we are interested in. The
manual implementation we currently have will no longer compile once the
resizable ArrayBuffer spec is implemented. The AO will be updated with
the spec implementation, so let's use it now to avoid breakage.
This renames IntegerIndexedElementGet to TypedArrayGetElement, and
IntegerIndexedElementSet to TypedArraySetElement.
This also renames the indexedPosition variable inside these method
definitions to byteIndexInBuffer.
These are part of a couple editorial changes in the ECMA-262 spec. See:
https://github.com/tc39/ecma262/commit/03e4410https://github.com/tc39/ecma262/commit/a1a4d48
The remainder of the changes in those commits apply to the resizable
ArrayBuffer spec, which is not implemented in LibJS as of this commit.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
We previously had a concept of unique shapes, which meant that they
couldn't be shared between multiple objects.
Object shapes became unique in three situations:
- They were the shape of the global object.
- They had more than 100 properties added to them.
- They had one or more properties deleted from them.
Unfortunately, unique shapes presented an annoying problem for inline
caches, and we added a "unique shape serial number" for being able to
tell that a unique shape had been mutated.
This patch gets rid of the concept of unique shapes, simplifying all
the caching code, since inline caches can now simply perform a shape
check and then we're good.
To make this possible, we now have the concept of delete transitions,
which occur when a property is deleted from a shape.
Note that this patch by itself introduces a performance regression in
some situtations, since we now create a lot more shapes, and marking
their property keys can be very heavy. This will be addressed in a
subsequent patch.
These are just like Uint8Array, except Put values have to be clamped
in the 0..255 range.
Takes CPU usage from 40% to 30% on the "Canvas Cycle" demo at
http://www.effectgames.com/demos/canvascycle/ :^)
When iterating over an iterable, we get back a JS object with the fields
"value" and "done".
Before this change, we've had two dedicated instructions for retrieving
the two fields: IteratorResultValue and IteratorResultDone. These had no
fast path whatsoever and just did a generic [[Get]] access to fetch the
corresponding property values.
By replacing the instructions with GetById("value") and GetById("done"),
they instantly get caching and JIT fast paths for free, making iterating
over iterables much faster. :^)
26% speed-up on this microbenchmark:
function go(a) {
for (const p of a) {
}
}
const a = [];
a.length = 1_000_000;
go(a);
This patch makes IteratorRecord an Object. Although it's not exposed to
author code, this does allow us to store it in a VM register.
Now that we can store it in a VM register, we don't need to convert it
back and forth between IteratorRecord and Object when accessing it from
bytecode.
The big win here is avoiding 3 [[Get]] accesses on every iteration step
of for..of loops. There are also a bunch of smaller efficiencies gained.
20% speed-up on this microbenchmark:
function go(a) {
for (const p of a) {
}
}
const a = [];
a.length = 1_000_000;
go(a);