Merging registers, constants and locals into single vector means:
- Better data locality
- No need to check type in Interpreter::get() and Interpreter::set()
which are very hot functions
Performance improvement is visible in almost all Octane and Kraken
tests.
Now that the interpreter is unrolled, we can advance the program counter
manually based on the current instruction type.
This makes most instructions a bit smaller. :^)
Instead of storing source offsets with each instruction, we now keep
them in a side table in Executable.
This shrinks each instruction by 8 bytes, further improving locality.
Instead of keeping bytecode as a set of disjoint basic blocks on the
malloc heap, bytecode is now a contiguous sequence of bytes(!)
The transformation happens at the end of Bytecode::Generator::generate()
and the only really hairy part is rerouting jump labels.
This required solving a few problems:
- The interpreter execution loop had to change quite a bit, since we
were storing BasicBlock pointers all over the place, and control
transfer was done by redirecting the interpreter's current block.
- Exception handlers & finalizers are now stored per-bytecode-range
in a side table in Executable.
- The interpreter now has a plain program counter instead of a stream
iterator. This actually makes error stack generation a bit nicer
since we just have to deal with a number instead of reaching into
the iterator.
This yields a 25% performance improvement on this microbenchmark:
for (let i = 0; i < 1_000_000; ++i) { }
But basically everything gets faster. :^)
This works by adding source start/end offset to every bytecode
instruction. In the future we can make this more efficient by keeping
a map of bytecode ranges to source ranges in the Executable instead,
but let's just get traces working first.
Co-Authored-By: Andrew Kaster <akaster@serenityos.org>
This change removes the mmap inside of Block in favor of a growing
vector of bytes. This is favorable for two reasons:
- We don't take more space than we need
- There is no limit to the growth of the vector (previously, if
the Block overstepped its 64kb boundary, it would just crash)
However, if that vector happens to resize, any pointer pointing into
that vector would become invalid. To avoid this, this commit adds an
InstructionHandle<Op> class which just stores a block and an offset
into that block.
This patch changes the LibJS bytecode to be a stream of instructions
packed one-after-the-other in contiguous memory, instead of a vector
of OwnPtr<Instruction>. This should be a lot more cache-friendly. :^)
Instructions are also devirtualized and instead have a type field
using a new Instruction::Type enum.
To iterate over a bytecode stream, one must now use
Bytecode::InstructionStreamIterator.
This patch begins the work of implementing JavaScript execution in a
bytecode VM instead of an AST tree-walk interpreter.
It's probably quite naive, but we have to start somewhere.
The basic idea is that you call Bytecode::Generator::generate() on an
AST node and it hands you back a Bytecode::Block filled with
instructions that can then be interpreted by a Bytecode::Interpreter.
This first version only implements two instructions: Load and Add. :^)
Each bytecode block has infinity registers, and the interpreter resizes
its register file to fit the block being executed.
Two new `js` options are added in this patch as well:
`-d` will dump the generated bytecode
`-b` will execute the generated bytecode
Note that unless `-d` and/or `-b` are specified, none of the bytecode
related stuff in LibJS runs at all. This is implemented in parallel
with the existing AST interpreter. :^)