Otherwise we crash the interpreter when an exception is thrown during
evaluation of the while or do/while test expression - which is easily
caused by a ReferenceError - e.g.:
while (someUndefinedVariable) {
// ...
}
A large number of JS strings are a single ASCII character. This patch
adds a 128-entry cache for those strings to the VM. The cost of the
cache is 1536 byte of GC heap (all in same block) + 2304 bytes malloc.
This avoids a lot of GC heap allocations, and packing all of these
in the same heap block is nice for fragmentation as well.
This allows us to provide better error messages as we can point the
syntax error location to the exact first invalid parameter instead of
always the end of the function within a object literal or class
definition.
Before this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Object setter property must have one argument (line: 1, column: 28)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Class setter method must have one argument (line: 1, column: 26)
After this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 23)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 21)
The only possible downside of this change is that class getters/setters
and functions in objects are not distinguished in the message anymore -
I don't think that's important though, and classes are (mostly) just
syntactic sugar anyway.
I'm about to add even more options and a bunch of unnamed true/false
arguments is really not helpful. Let's make this a single parse options
parameter using bit flags.
This provides a huge speed-up for objects with large numbers as property
keys in some situation. Previously we would simply iterate from 0-<max>
and check if there's a non-empty value at each index - now we're being
smarter and compute a list of non-empty indices upfront, by checking
each value in the packed elements vector and appending the sparse
elements hashmap keys (for GenericIndexedPropertyStorage).
Consider this example, an object with a single own property, which is a
number increasing by a factor of 10 each iteration:
for (let i = 0; i < 10; ++i) {
const o = {[10 ** i]: "foo"};
const start = Date.now();
Object.getOwnPropertyNames(o); // <-- IndexedPropertyIterator
const end = Date.now();
console.log(`${10 ** i} -> ${(end - start) / 1000}s`);
}
Before this change:
1 -> 0.0000s
10 -> 0.0000s
100 -> 0.0000s
1000 -> 0.0000s
10000 -> 0.0005s
100000 -> 0.0039s
1000000 -> 0.0295s
10000000 -> 0.2489s
100000000 -> 2.4758s
1000000000 -> 25.5669s
After this change:
1 -> 0.0000s
10 -> 0.0000s
100 -> 0.0000s
1000 -> 0.0000s
10000 -> 0.0000s
100000 -> 0.0000s
1000000 -> 0.0000s
10000000 -> 0.0000s
100000000 -> 0.0000s
1000000000 -> 0.0000s
Fixes#3805.
If there's a newline between the closing paren and arrow it's not a
valid arrow function, ASI should kick in instead (it'll then fail with
"Unexpected token Arrow")
This simplifies try_parse_arrow_function_expression() and fixes a few
cases that should not produce an arrow function AST but did:
(a,,) => {}
(a b) => {}
(a ...b) => {}
(...b a) => {}
The new parsing logic checks whether parens are expected and uses
parse_function_parameters() if so, rolling back if a new syntax error
occurs during that. Otherwise it's just an identifier in which case we
parse the single parameter ourselves.
It would be cool to solve this in a general way so that looking up
a string literal or StringView in a HashMap with String keys avoids
creating a temp string.
For now, this patch simply addresses the issue in JS::Lexer.
This is a 2-3% speed-up on test-js.
Instead of performing a prototype transition for every new object we
create via {}, prebake the object returned by Object::create_empty()
with a shape with ObjectPrototype as the prototype.
We also prebake the shape for the object assigned to the "prototype"
property of new ScriptFunction objects, since those are extremely
common and that code broke from this change anyway.
This avoid a large number of transitions and is a small speed-up on
test-js.
This is not actually necessary, since no GC allocations are made during
this process. If we ever make property tables into heap cells, we'd
have to rethink this.
This assumption only works for the m_packed_elements Vector where a
missing value at a certain index still returns an empty value, but not
for the m_sparse_elements HashMap, which is being used for indices
>= 200 - in that case the Optional<ValueAndAttributes> result will not
have a value.
This fixes a crash in the js REPL where printing an array with a hole at
any index >= 200 would crash.
When we're initializing objects, we're just adding a bunch of new
properties, without transition, and without overlap (we never add
the same property twice.)
Take advantage of this by skipping lookups entirely (no need to see
if we're overwriting an existing property) during initialization.
Another nice test-js speedup :^)
Roughly 7% of test-js runtime was spent creating FlyStrings from string
literals. This patch frontloads that work and caches all the commonly
used names in LibJS on a CommonPropertyNames struct that hangs off VM.
When changing the attributes of an existing property of an object with
unique shape we must not change the PropertyMetadata offset.
Doing so without resizing the underlying storage vector caused an OOB
write crash.
Fixes#3735.
Previously, when a loop detected an unwind of type ScopeType::Function
(which means a return statement was executed inside of the loop), it
would just return undefined. This set the VM's last_value to undefined,
when it should have been the returned value. This patch makes all loop
statements return the appropriate value in the above case.
'continue' is no longer allowed outside of a loop, and an unlabeled
'break' is not longer allowed outside of a loop or switch statement.
Labeled 'break' statements are still allowed everywhere, even if the
label does not exist.
Instead of keeping all the HeapBlocks in one big list, we now split it
into two levels:
- Heap has a set of Allocators, each with a specific cell size.
- Allocators have two lists of blocks, "full" and "usable".
Allocating a new cell no longer has to scan the entire set of blocks,
but instead just needs to find the right allocator and then pop a cell
from its freelist. If all the blocks in the allocator are full, a new
block will be created.
Blocks are moved from the "full" to "usable" list after sweeping has
determined that they are not completely empty and not completely full.
There are certainly many ways we can improve on this. This patch is
mostly about getting the new allocator architecture in place. :^)
While initialization common runtime objects like functions, prototypes,
etc, we don't really care about tracking transitions for each and every
property added to them.
This patch puts objects into a "disable transitions" mode while we call
initialize() on them. After that, adding more properties will cause new
transitions to be generated and added to the chain.
This gives a ~10% speed-up on test-js. :^)
When reifying a shape transition chain, look for the nearest previous
shape in the transition chain that has a property table already, and
use that as the starting point.
This achieves two things:
1. We do less work when reifying property tables that already have
partial property tables earlier in the chain.
2. This enables adding properties to a shape without performing a
transition. This will be useful for initializing runtime objects
with way fewer allocations. See next patch. :^)
The check for invalid lhs and assignment to eval/arguments in strict
mode should happen for all kinds of assignment expressions, not just
AssignmentOp::Assignment.
So far we have three different syntax highlighters for LibJS:
- js's Line::Editor stylization
- JS::MarkupGenerator
- GUI::JSSyntaxHighlighter
This not only caused repetition of most token types in each highlighter
but also a lot of inconsistency regarding the styling of certain tokens:
- JSSyntaxHighlighter was considering TokenType::Period to be an
operator whereas MarkupGenerator categorized it as punctuation.
- MarkupGenerator was considering TokenType::{Break,Case,Continue,
Default,Switch,With} control keywords whereas JSSyntaxHighlighter just
disregarded them
- MarkupGenerator considered some future reserved keywords invalid and
others not. JSSyntaxHighlighter and js disregarded most
Adding a new token type meant adding it to ENUMERATE_JS_TOKENS as well
as each individual highlighter's switch/case construct.
I added a TokenCategory enum, and each TokenType is now associated to a
certain category, which the syntax highlighters then can use for styling
rather than operating on the token type directly. This also makes
changing a token's category everywhere easier, should we need to do that
(e.g. I decided to make TokenType::{Period,QuestionMarkPeriod}
TokenCategory::Operator for now, but we might want to change them to
Punctuation.
There's no point in trying to achieve shape sharing for global objects,
so we can simply make the shape unique from the start and avoid making
a transition chain.
Previously whenever you would ask a Shape how many properties it had,
it would reify the property table into a HashMap and use HashMap::size()
to answer the question.
This can be a huge waste of time if we don't need the property table for
anything else, so this patch implements property count tracking in a
separate integer member of Shape. :^)
Since blocks can't be strict by themselves, it makes no sense for them
to store whether or not they are strict. Strict-ness is now stored in
the Program and FunctionNode ASTNodes. Fixes issue #3641
When scanning for potential heap pointers during conservative GC,
we look for any value that is an address somewhere inside a heap cell.
However, we were failing to account for the slack at the end of a
block (which occurs whenever the block storage size isn't an exact
multiple of the cell size.) Pointers inside the trailing slack were
misidentified as pointers into "last_cell+1".
Instead of skipping over them, we would treat this garbage data as a
live cell and try to mark it. I believe this is the test-js crash that
has been terrorizing Travis for a while. :^)
Each JS global object has its own "console", so it makes more sense to
store it in GlobalObject.
We'll need some smartness later to bundle up console messages from all
the different frames that make up a page later, but this works for now.
More work on decoupling the general runtime from Interpreter. The goal
is becoming clearer. Interpreter should be one possible way to execute
code inside a VM. In the future we might have other ways :^)
Okay, my vision here is improving. Interpreter should be a thing that
executes an AST. The scope stack is irrelevant to the VM proper,
so we can move that to the Interpreter. Same with execute_statement().
This patch moves the exception state, call stack and scope stack from
Interpreter to VM. I'm doing this to help myself discover what the
split between Interpreter and VM should be, by shuffling things around
and seeing what falls where.
With these changes, we no longer have a persistent lexical environment
for the current global object on the Interpreter's call stack. Instead,
we push/pop that environment on Interpreter::run() enter/exit.
Since it should only be used to find the global "this", and not for
variable storage (that goes directly into the global object instead!),
I had to insert some short-circuiting when walking the environment
parent chain during variable lookup.
Note that this is a "stepping stone" commit, not a final design.
Empty string is extremely common and we can avoid a lot of heap churn
by simply caching one in the VM. Primitive strings are immutable anyway
so there is no observable behavior change outside of fewer collections.
To make it a little clearer what this is for. (This is an RAII helper
class for adding and removing an Interpreter to a VM's list of the
currently active (executing code) Interpreters.)
Taking a big step towards a world of multiple global object, this patch
adds a new JS::VM object that houses the JS::Heap.
This means that the Heap moves out of Interpreter, and the same Heap
can now be used by multiple Interpreters, and can also outlive them.
The VM keeps a stack of Interpreter pointers. We push/pop on this
stack when entering/exiting execution with a given Interpreter.
This allows us to make this change without disturbing too much of
the existing code.
There is still a 1-to-1 relationship between Interpreter and the
global object. This will change in the future.
Ultimately, the goal here is to make Interpreter a transient object
that only needs to exist while you execute some code. Getting there
will take a lot more work though. :^)
Note that in LibWeb, the global JS::VM is called main_thread_vm(),
to distinguish it from future worker VM's.
This one is a little weird. I don't know why it's okay for this
function to assume that there is a current scope on the scope stack
when it can be called during global object initialization etc.
For now, just make it say "we are in strict mode" when there is no
currently active scope.
Shape was allocating property tables inside visit_children(), which
could cause garbage collection to happen. It's not very good to start
a new garbage collection while you are in the middle of one already.
In the case of an exception in a property getter function we would not
return early, and a subsequent attempt to call the replacer function
would crash the interpreter due to call_internal() asserting.
Fixes#3548.
The parser considers it a syntax error at the moment, other engines
throw a ReferenceError during runtime for ++foo(), --foo(), foo()++ and
foo()--, so I assume the spec defines this.
This fixes two cases obj[expr] and obj[expr]() (MemberExpression and
CallExpression respectively) when expr throws an exception and results
in an empty value, causing a crash by passing the invalid PropertyName
created by computed_property_name() to Object::get() without checking it
first.
Fixes#3459.
This fixes two issues with running a TryStatement finalizer:
- Temporarily store and clear the exception, if any, so we can run the
finalizer block statement without it getting in our way, which could
have unexpected side effects otherwise (and will likely return early
somewhere).
- Stop unwinding so more than one child node of the finalizer
BlockStatement is executed if an exception has been thrown previously
(which would have called unwind(ScopeType::Try)). Re-throwing as
described above ensures we still unwind after the finalizer, if
necessary.
Also add some tests specifically for try/catch/finally blocks, we
didn't have any!
Interpreter::run() was so far being used both as the "public API entry
point" for running a JS::Program as well as internally to execute
JS::Statement|s of all kinds - this is now more distinctly separated.
A program as returned by the parser is still going through run(), which
is responsible for creating the initial global call frame, but all other
statements are executed via execute_statement() directly.
Fixes#3437, a regression introduced by adding ASSERT(!exception()) to
run() without considering the effects that would have on internal usage.
This broke in case of unterminated regular expressions, causing goofy location
numbers, and 'source_location_hint' to eat up all memory:
Unexpected token UnterminatedRegexLiteral. Expected statement (line: 2, column: 4294967292)
Looks like an oversight to me - we were not actually setting a new value
for m_array_size, which would cause arrays created with generic storage
to report a .length of 0.
This is already considered in put()/insert()/append_all() but not
set_array_like_size(), which crashed the interpreter with an assertion
when creating an array with more than SPARSE_ARRAY_THRESHOLD (200)
initial elements as the simple storage was being resized beyond its
limit.
Fixes#3382.
Before, we had about these occurrence counts:
COPY: 13 without, 33 with
MOVE: 12 without, 28 with
Clearly, 'with' was the preferred way. However, this introduced double-semicolons
all over the place, and caused some warnings to trigger.
This patch *forces* the usage of a semi-colon when calling the macro,
by removing the semi-colon within the macro. (And thus also gets rid
of the double-semicolon.)
The fact that a `MarkedValueList` had to be created was just annoying,
so here's an alternative.
This patchset also removes some (now) unneeded MarkedValueList.h includes.
The motivation for this change is twofold:
- Returning a JS::Value is misleading as one would expect it to carry
some meaningful information, like maybe the error object that's being
created, but in fact it is always empty. Supposedly to serve as a
shortcut for the common case of "throw and return empty value", but
that's just leading us to my second point.
- Inconsistent usage / coding style: as of this commit there are 114
uses of throw_exception() discarding its return value and 55 uses
directly returning the call result (in LibJS, not counting LibWeb);
with the first style often having a more explicit empty value (or
nullptr in some cases) return anyway.
One more line to always make the return value obvious is should be
worth it.
So now it's basically always these steps, which is already being used in
the majority of cases (as outlined above):
- Throw an exception. This mutates interpreter state by updating
m_exception and unwinding, but doesn't return anything.
- Let the caller explicitly return an empty value, nullptr or anything
else itself.
The tzset documentation says that TZ allows a per-second local timezone,
so don't be needlessly clever here.
No observable behavior difference at this point, but if we ever
implement tzset, this will have a small effect.
Test files created with:
$ for f in Libraries/LibJS/Tests/builtins/Date/Date.prototype.get*js; do
cp $f $(echo $f | sed -e 's/get/getUTC/') ;
done
$ rm Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTCTime.js
$ git add Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTC*.js
$ ls Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTC*.js | \
xargs sed -i -e 's/get/getUTC/g'
Year computation has to be based on seconds, not days, in case
t is < 0 but t / __seconds_per_day is 0.
Year computation also has to consider negative timestamps.
With this, days is always positive and <= the number of days in the
year, so base the tm_wday computation directly on the timestamp,
and do it first, before t is modified in the year computation.
In C, % can return a negative number if the left operand is negative,
compensate for that.
Tested via test-js. (Except for tm_wday, since we don't implement
Date.prototype.getUTCDate() yet.)
LibJS doesn't store stacks for exception objects, so this
only amends test-common.js's __expect() with an optional
`details` function that can produce a more detailed error
message, and it lets test-js.cpp read and print that
error message. I added the optional details parameter to
a few matchers, most notably toBe() where it now prints
expected and actual value.
It'd be nice to have line numbers of failures, but that
seems hard to do with the current design, and this is already
much better than the current state.
The constructor with a string argument isn't implemented yet, but
this implements the other variants.
The timestamp constructor doens't handle negative timestamps correctly.
Out-of-bound and invalid arguments aren't handled correctly.
The optional 2nd and 3rd arguments are not yet implemented.
This assumes that `this` is the Array constructor and doesn't yet
implement the more general behavior in the ES6 spec that allows
transferring this method to other constructors.
You can now pass print_report=true to Heap::collect_garbage() and it
will print out a little summary of the time spent, and counts of
live vs freed cells and blocks.
The SI prefixes "k", "M", "G" mean "10^3", "10^6", "10^9".
The IEC prefixes "Ki", "Mi", "Gi" mean "2^10", "2^20", "2^30".
Let's use the correct name, at least in code.
Only changes the name of the constants, no other behavior change.
Decorated Interpreter::call() with [[nodiscard]] to provoke thinking
about the returned value at each call site. This is definitely not
perfect and we should really start thinking about slimming down the
public-facing LibJS interpreter API.
Fixes#3136.
This is being used in match_identifier_name(), for example when parsing
property keys - the list was incomplete, likely as some token types were
added later, leading to some unexpected syntax errors:
> var e = {};
undefined
> e.extends = "a";
e.extends = "a";
^
Uncaught exception: [SyntaxError]: Unexpected token Extends. Expected IdentifierName (line: 1, column: 3)
Fixes#3128.
This is to prevent bugs like #3091 (fixed in
9810f8872c21eaf2aefff25347d957cd26f34c2d) in the future; we generally
don't want Interpreter::run() to be called if the interpreter still has
an exception stored. Sure, it could clear those itself but letting users
of the interpreter do it explicitly seems sensible.
More steps towards multiple global object support. Primitive cells
like strings, bigints, etc, don't actually have any connection to
the global object. Use the explicit API to clarify this.
btoa() takes a byte string, so it must decode the UTF-8 argument into
a Vector<u8> before calling encode_base64.
Likewise, in atob() decode_base64 returns a byte string, so that needs
to be converted to UTF-8.
With this, `btoa(String.fromCharCode(255))` is '/w==' as it should
be, and `atob(btoa(String.fromCharCode(255))) == String.fromCharCode(255)`
remains true.
With this, typing `"\xff"` into Browser's console no longer
makes the app crash.
While here, also make the \u handler call append_codepoint()
instead of calling an overload where it's not immediately clear
which overload is getting called. This has no behavior change.
It's broken for strings with characters outside 7-bit ASCII, but
it's broken in the same way as several existing functions (e.g.
charAt()), so that's probably ok for now.
Finally use Symbol.iterator protocol in language features :) currently
only used in for-of loops and spread expressions, but will have more
uses later (Maps, Sets, Array.from, etc).
Not only is this a much nicer api (can't pass a typo'd string into the
get_well_known_symbol function), it is also a bit more performant since
there are no hashmap lookups.
With the addition of symbol keys, work can now be done on starting to
implement the well-known symbol functionality. The most important of
these well-known symbols is by far Symbol.iterator.
This patch adds IteratorPrototype, as well as ArrayIterator and
ArrayIteratorPrototype. In the future, sometime after StringIterator has
also been added, this will allow us to use Symbol.iterator directly in
for..of loops, enabling the use of custom iterator objects. Also makes
adding iterator support to native objects much easier (as will have to
be done for Map and Set, when they get added).
Skipped tests count as a "pass" rather than a "fail" (i.e. a test suite
with a skipped test will pass), however it does display a message when
the test is printing.
This is intended for tests which _should_ work, but currently do not.
This should be preferred over "// FIXME" notes if possible.
This commit also exposes JSONObject's implementation of stringify to the
public, so that it can be used by test-js without having to go through
the interpreter's environment.
This moves most of the work from run-tests.sh to test-js.cpp. This way,
we have a lot more control over how the test suite runs, as well as how
it outputs. This should result in some cool functionality!
This commit also refactors test-common.js to mimic the jest library.
This should allow tests to be much more expressive :)
- Use emojis instead of the pass/fail text
- Fix the new version of the script to run inside Serenity
- Don't print erroneous output after 'Output:'; start on a newline
instead
- Skip 'run-tests.sh' while testing
literal methods; add EnvrionmentRecord fields and methods to
LexicalEnvironment
Adding EnvrionmentRecord's fields and methods lets us throw an exception
when |this| is not initialized, which occurs when the super constructor
in a derived class has not yet been called, or when |this| has already
been initialized (the super constructor was already called).
This is a helper function based on the getter/setter definition logic from
ObjectExpression::execute() to look up an Accessor property if it already
exists, define a new Accessor property if it doesn't exist, and set the getter or
setter function on the Accessor.
Previously, debugging a test with console.log statements was impossible,
because it would just cause the test to fail with no additional output.
Now, if the output of a test is not "PASS", the output will be printed
under the line where the test failed.
Empty output will have a special message attached to it -- useful when
a test author has forgotten to include `console.log("PASS")` at the end
of a test.
Divide the Object constructor into three variants:
- The regular one (takes an Object& prototype)
- One for use by GlobalObject
- One for use by objects without a prototype (e.g ObjectPrototype)
Instead of taking the JS::Heap&. This allows us to get rid of some
calls to JS::Interpreter::global_object(). We're getting closer and
closer to multiple global objects. :^)
We're still missing optional argument support, so this implementation
doesn't support fill(), only fill(fill_rule).
Still it's really nice to get rid of so much hand-written wrapper code.
To allow implementing the DOM class hierarchy in JS bindings, this
patch adds an inherits() function that can be used to ask an Object
if it inherits from a specific C++ class (by name).
The necessary overrides are baked into each Object subclass by the
new JS_OBJECT macro, which works similarly to C_OBJECT in LibCore.
Thanks to @Dexesttp for suggesting this approach. :^)
Instead of only checking the class_name(), we now generate an is_foo()
virtual in the wrapper generator. (It's currently something we override
on Bindings::Wrapper, which is not really scalable.)
Longer term we'll need to think up something smarter for verifying that
one wrapper "is" another type of wrapper.