RegExpInitialize specifies how the pattern string should be created
before passing it to [[RegExpMatcher]]. Rather than passing it as-is,
the string should be converted to code points and back to a "List" (if
the Unicode flag is present), or as a "List" of UTF-16 code units.
Further. the spec requires that we keep both the original pattern string
and this parsed string in the RegExp object.
The caveat is that the LibRegex parser further requires any multi-byte
code units to be escaped (as "\unnnn"). Otherwise, the code unit is
recognized as individual UTF-8 bytes.
This removes all usages of the non-standard put helper method and
replaces all of it's usages with the specification required alternative
or with define_direct_property where appropriate.
This removes all usages of the non-standard define_property helper
method and replaces all it's usages with the specification required
alternative or with define_direct_property where appropriate.
This is a huge patch, I know. In hindsight this perhaps could've been
done slightly more incremental, but I started and then fixed everything
until it worked, and here we are. I tried splitting of some completely
unrelated changes into separate commits, however. Anyway.
This is a rewrite of most of Object, and by extension large parts of
Array, Proxy, Reflect, String, TypedArray, and some other things.
What we already had worked fine for about 90% of things, but getting the
last 10% right proved to be increasingly difficult with the current code
that sort of grew organically and is only very loosely based on the
spec - this became especially obvious when we started fixing a large
number of test262 failures.
Key changes include:
- 1:1 matching function names and parameters of all object-related
functions, to avoid ambiguity. Previously we had things like put(),
which the spec doesn't have - as a result it wasn't always clear which
need to be used.
- Better separation between object abstract operations and internal
methods - the former are always the same, the latter can be overridden
(and are therefore virtual). The internal methods (i.e. [[Foo]] in the
spec) are now prefixed with 'internal_' for clarity - again, it was
previously not always clear which AO a certain method represents,
get() could've been both Get and [[Get]] (I don't know which one it
was closer to right now).
Note that some of the old names have been kept until all code relying
on them is updated, but they are now simple wrappers around the
closest matching standard abstract operation.
- Simplifications of the storage layer: functions that write values to
storage are now prefixed with 'storage_' to make their purpose clear,
and as they are not part of the spec they should not contain any steps
specified by it. Much functionality is now covered by the layers above
it and was removed (e.g. handling of accessors, attribute checks).
- PropertyAttributes has been greatly simplified, and is being replaced
by PropertyDescriptor - a concept similar to the current
implementation, but more aligned with the actual spec. See the commit
message of the previous commit where it was introduced for details.
- As a bonus, and since I had to look at the spec a whole lot anyway, I
introduced more inline comments with the exact steps from the spec -
this makes it super easy to verify correctness.
- East-const all the things.
As a result of all of this, things are much more correct but a bit
slower now. Retaining speed wasn't a consideration at all, I have done
no profiling of the new code - there might be low hanging fruits, which
we can then harvest separately.
Special thanks to Idan for helping me with this by tracking down bugs,
updating everything outside of LibJS to work with these changes (LibWeb,
Spreadsheet, HackStudio), as well as providing countless patches to fix
regressions I introduced - there still are very few (we got it down to
5), but we also get many new passing test262 tests in return. :^)
Co-authored-by: Idan Horowitz <idan.horowitz@gmail.com>
Specifically, this now explicitly takes the length, adds missing
exceptions checks to calls with user-supplied lengths, takes and uses
the prototype argument, and fixes some spec non-conformance in
ArrayConstructor and its native functions around the use of ArrayCreate
The parser doesn't always track lexical scopes correctly, so let's not
rely on that for direct argument loading.
This reverts the LoadArguments bytecode instruction as well. We can
bring these things back when the parser can reliably tell us that
a given Identifier is indeed a function argument.
To better follow the spec, we need to distinguish between the current
execution context's lexical environment and variable environment.
This patch moves us to having two record pointers, although both of
them point at the same environment records for now.
This patch makes the following name changes:
- ScopeObject => EnvironmentRecord
- LexicalEnvironment => DeclarativeEnvironmentRecord
- WithScope => ObjectEnvironmentRecord
This now matches the spec's OrdinaryObjectCreate() across the board:
instead of implicitly setting the created object's prototype to
%Object.prototype% and then in many cases setting it to a nullptr right
away, it now has an 'Object* prototype' parameter with _no default
value_. This makes the code easier to compare with the spec, very clear
in terms of what prototype is being used as well as avoiding unnecessary
shape transitions.
Also fixes a couple of cases were we weren't setting the correct
prototype.
There's no reason to assume that the object would not be empty (as in
having own properties), so let's follow our existing pattern of
Type::create(...) and simply call it 'create'.
This commit adds a bunch of passes, the most interesting of which is a
pass that merges blocks together, and a pass that places blocks that
flow into each other next to each other, and a very simply pass that
removes duplicate basic blocks.
Note that this does not remove the jump at the end of each block in that
pass to avoid scope creep in the passes.
These are pretty hairy if someone forgets to override one, as the
catchall function in Instruction will keep calling itself over and over
again, leading to really hard-to-debug situations.
This is generated for Identifier nodes that represent a function
argument variable. It loads a given argument index from the current
call frame into the accumulator.
This patch adds a CallType to the Bytecode::Op::Call instruction,
which can be either Call or Construct. We then generate Construct
calls for the NewExpression AST node.
When executed, these get fed into VM::construct().
This adds a new PushLexicalEnvironment instruction that creates a new
LexicalEnvironment and pushes it on the VM's scope stack.
There is no corresponding PopLexicalEnvironment instruction yet,
so this will behave incorrectly with let/const scopes for example.
This replaces Bytecode::Op::EnterScope with a new NewFunction op that
instantiates a ScriptFunction from a given FunctionNode (AST).
This is then used to instantiate the local functions directly from
bytecode when entering a ScopeNode. :^)
EnterUnwindContext pushes an unwind context (exception handler and/or
finalizer) onto a stack.
LeaveUnwindContext pops the unwind context from that stack.
Upon return to the interpreter loop we check whether the VM has an
exception pending. If no unwind context is available we return from the
loop. If an exception handler is available we clear the VM's exception,
put the exception value into the accumulator register, clear the unwind
context's handler and jump to the handler. If no handler is available
but a finalizer is available we save the exception value + metadata (for
later use by ContinuePendingUnwind), clear the VM's exception, pop the
unwind context and jump to the finalizer.
ContinuePendingUnwind checks whether a saved exception is available. If
no saved exception is available it jumps to the resume label. Otherwise
it stores the exception into the VM.
The Jump after LeaveUnwindContext could be integrated into the
LeaveUnwindContext instruction. I've kept them separate for now to make
the bytecode more readable.
> try { 1; throw "x" } catch (e) { 2 } finally { 3 }; 4
1:
[ 0] EnterScope
[ 10] EnterUnwindContext handler:@4 finalizer:@3
[ 38] EnterScope
[ 48] LoadImmediate 1
[ 60] NewString 1 ("x")
[ 70] Throw
<for non-terminated blocks: insert LeaveUnwindContext + Jump @3 here>
2:
[ 0] LoadImmediate 4
3:
[ 0] EnterScope
[ 10] LoadImmediate 3
[ 28] ContinuePendingUnwind resume:@2
4:
[ 0] SetVariable 0 (e)
[ 10] EnterScope
[ 20] LoadImmediate 2
[ 38] LeaveUnwindContext
[ 3c] Jump @3
String Table:
0: e
1: x
Instead of using Strings in the bytecode ops this adds a global string
table to the Executable struct which individual operations can refer
to using indices. This brings bytecode ops one step closer to being
pointer free.
Added Increment and Decrement bytecode ops to support this. Postfix
updates use a temporary register to preserve the original value.
Note that this patch only implements Identifier updates. Member
expression updates are a TODO.
This limits the size of each block (currently set to 1K), and gets us
closer to a canonical, more easily analysable bytecode format.
As a result of this, "Labels" are now simply entries to basic blocks.
Since there is no more 'conditional' jump (as all jumps are always
taken), JumpIf{True,False} are unified to JumpConditional, and
JumpIfNullish is renamed to JumpNullish.
Also fixes#7914 as a result of reimplementing the loop logic.
This commit introduces the concept of an accumulator register to
LibJS's bytecode interpreter. The accumulator register is always
register 0, and most simple instructions use it for reading and
writing.
Not only does this slim down the AST, but it also simplifies a lot of
the code. For example, the generate_bytecode methods no longer need
to return an Optional<Register>, as any opcode which has a "return"
value will always put it into the accumulator.
This also renames the old Op::Load to Op::LoadImmediate, and uses
Op::Load to load from a register into the accumulator. There is
also an Op::Store to put the value in the accumulator into another
register.