More work on decoupling the general runtime from Interpreter. The goal
is becoming clearer. Interpreter should be one possible way to execute
code inside a VM. In the future we might have other ways :^)
Okay, my vision here is improving. Interpreter should be a thing that
executes an AST. The scope stack is irrelevant to the VM proper,
so we can move that to the Interpreter. Same with execute_statement().
This patch moves the exception state, call stack and scope stack from
Interpreter to VM. I'm doing this to help myself discover what the
split between Interpreter and VM should be, by shuffling things around
and seeing what falls where.
With these changes, we no longer have a persistent lexical environment
for the current global object on the Interpreter's call stack. Instead,
we push/pop that environment on Interpreter::run() enter/exit.
Since it should only be used to find the global "this", and not for
variable storage (that goes directly into the global object instead!),
I had to insert some short-circuiting when walking the environment
parent chain during variable lookup.
Note that this is a "stepping stone" commit, not a final design.
Empty string is extremely common and we can avoid a lot of heap churn
by simply caching one in the VM. Primitive strings are immutable anyway
so there is no observable behavior change outside of fewer collections.
To make it a little clearer what this is for. (This is an RAII helper
class for adding and removing an Interpreter to a VM's list of the
currently active (executing code) Interpreters.)
Taking a big step towards a world of multiple global object, this patch
adds a new JS::VM object that houses the JS::Heap.
This means that the Heap moves out of Interpreter, and the same Heap
can now be used by multiple Interpreters, and can also outlive them.
The VM keeps a stack of Interpreter pointers. We push/pop on this
stack when entering/exiting execution with a given Interpreter.
This allows us to make this change without disturbing too much of
the existing code.
There is still a 1-to-1 relationship between Interpreter and the
global object. This will change in the future.
Ultimately, the goal here is to make Interpreter a transient object
that only needs to exist while you execute some code. Getting there
will take a lot more work though. :^)
Note that in LibWeb, the global JS::VM is called main_thread_vm(),
to distinguish it from future worker VM's.
Shape was allocating property tables inside visit_children(), which
could cause garbage collection to happen. It's not very good to start
a new garbage collection while you are in the middle of one already.
In the case of an exception in a property getter function we would not
return early, and a subsequent attempt to call the replacer function
would crash the interpreter due to call_internal() asserting.
Fixes#3548.
Interpreter::run() was so far being used both as the "public API entry
point" for running a JS::Program as well as internally to execute
JS::Statement|s of all kinds - this is now more distinctly separated.
A program as returned by the parser is still going through run(), which
is responsible for creating the initial global call frame, but all other
statements are executed via execute_statement() directly.
Fixes#3437, a regression introduced by adding ASSERT(!exception()) to
run() without considering the effects that would have on internal usage.
Looks like an oversight to me - we were not actually setting a new value
for m_array_size, which would cause arrays created with generic storage
to report a .length of 0.
This is already considered in put()/insert()/append_all() but not
set_array_like_size(), which crashed the interpreter with an assertion
when creating an array with more than SPARSE_ARRAY_THRESHOLD (200)
initial elements as the simple storage was being resized beyond its
limit.
Fixes#3382.
Before, we had about these occurrence counts:
COPY: 13 without, 33 with
MOVE: 12 without, 28 with
Clearly, 'with' was the preferred way. However, this introduced double-semicolons
all over the place, and caused some warnings to trigger.
This patch *forces* the usage of a semi-colon when calling the macro,
by removing the semi-colon within the macro. (And thus also gets rid
of the double-semicolon.)
The fact that a `MarkedValueList` had to be created was just annoying,
so here's an alternative.
This patchset also removes some (now) unneeded MarkedValueList.h includes.
The motivation for this change is twofold:
- Returning a JS::Value is misleading as one would expect it to carry
some meaningful information, like maybe the error object that's being
created, but in fact it is always empty. Supposedly to serve as a
shortcut for the common case of "throw and return empty value", but
that's just leading us to my second point.
- Inconsistent usage / coding style: as of this commit there are 114
uses of throw_exception() discarding its return value and 55 uses
directly returning the call result (in LibJS, not counting LibWeb);
with the first style often having a more explicit empty value (or
nullptr in some cases) return anyway.
One more line to always make the return value obvious is should be
worth it.
So now it's basically always these steps, which is already being used in
the majority of cases (as outlined above):
- Throw an exception. This mutates interpreter state by updating
m_exception and unwinding, but doesn't return anything.
- Let the caller explicitly return an empty value, nullptr or anything
else itself.
The tzset documentation says that TZ allows a per-second local timezone,
so don't be needlessly clever here.
No observable behavior difference at this point, but if we ever
implement tzset, this will have a small effect.
Test files created with:
$ for f in Libraries/LibJS/Tests/builtins/Date/Date.prototype.get*js; do
cp $f $(echo $f | sed -e 's/get/getUTC/') ;
done
$ rm Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTCTime.js
$ git add Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTC*.js
$ ls Libraries/LibJS/Tests/builtins/Date/Date.prototype.getUTC*.js | \
xargs sed -i -e 's/get/getUTC/g'
The constructor with a string argument isn't implemented yet, but
this implements the other variants.
The timestamp constructor doens't handle negative timestamps correctly.
Out-of-bound and invalid arguments aren't handled correctly.
The optional 2nd and 3rd arguments are not yet implemented.
This assumes that `this` is the Array constructor and doesn't yet
implement the more general behavior in the ES6 spec that allows
transferring this method to other constructors.
Decorated Interpreter::call() with [[nodiscard]] to provoke thinking
about the returned value at each call site. This is definitely not
perfect and we should really start thinking about slimming down the
public-facing LibJS interpreter API.
Fixes#3136.
More steps towards multiple global object support. Primitive cells
like strings, bigints, etc, don't actually have any connection to
the global object. Use the explicit API to clarify this.
btoa() takes a byte string, so it must decode the UTF-8 argument into
a Vector<u8> before calling encode_base64.
Likewise, in atob() decode_base64 returns a byte string, so that needs
to be converted to UTF-8.
With this, `btoa(String.fromCharCode(255))` is '/w==' as it should
be, and `atob(btoa(String.fromCharCode(255))) == String.fromCharCode(255)`
remains true.
It's broken for strings with characters outside 7-bit ASCII, but
it's broken in the same way as several existing functions (e.g.
charAt()), so that's probably ok for now.
Finally use Symbol.iterator protocol in language features :) currently
only used in for-of loops and spread expressions, but will have more
uses later (Maps, Sets, Array.from, etc).
Not only is this a much nicer api (can't pass a typo'd string into the
get_well_known_symbol function), it is also a bit more performant since
there are no hashmap lookups.
With the addition of symbol keys, work can now be done on starting to
implement the well-known symbol functionality. The most important of
these well-known symbols is by far Symbol.iterator.
This patch adds IteratorPrototype, as well as ArrayIterator and
ArrayIteratorPrototype. In the future, sometime after StringIterator has
also been added, this will allow us to use Symbol.iterator directly in
for..of loops, enabling the use of custom iterator objects. Also makes
adding iterator support to native objects much easier (as will have to
be done for Map and Set, when they get added).
This commit also exposes JSONObject's implementation of stringify to the
public, so that it can be used by test-js without having to go through
the interpreter's environment.
literal methods; add EnvrionmentRecord fields and methods to
LexicalEnvironment
Adding EnvrionmentRecord's fields and methods lets us throw an exception
when |this| is not initialized, which occurs when the super constructor
in a derived class has not yet been called, or when |this| has already
been initialized (the super constructor was already called).
This is a helper function based on the getter/setter definition logic from
ObjectExpression::execute() to look up an Accessor property if it already
exists, define a new Accessor property if it doesn't exist, and set the getter or
setter function on the Accessor.
Divide the Object constructor into three variants:
- The regular one (takes an Object& prototype)
- One for use by GlobalObject
- One for use by objects without a prototype (e.g ObjectPrototype)
Instead of taking the JS::Heap&. This allows us to get rid of some
calls to JS::Interpreter::global_object(). We're getting closer and
closer to multiple global objects. :^)
We're still missing optional argument support, so this implementation
doesn't support fill(), only fill(fill_rule).
Still it's really nice to get rid of so much hand-written wrapper code.
To allow implementing the DOM class hierarchy in JS bindings, this
patch adds an inherits() function that can be used to ask an Object
if it inherits from a specific C++ class (by name).
The necessary overrides are baked into each Object subclass by the
new JS_OBJECT macro, which works similarly to C_OBJECT in LibCore.
Thanks to @Dexesttp for suggesting this approach. :^)
Instead of only checking the class_name(), we now generate an is_foo()
virtual in the wrapper generator. (It's currently something we override
on Bindings::Wrapper, which is not really scalable.)
Longer term we'll need to think up something smarter for verifying that
one wrapper "is" another type of wrapper.
Also let's settle on calling the operation of fetching the "this" value
from the Interpreter and converting it to a specific Object pointer
typed_this() since consistency is nice.
To make sure that everything is set up correctly in objects before we
start adding properties to them, we split cell allocation into 3 steps:
1. Allocate a cell of appropriate size from the Heap
2. Call the C++ constructor on the cell
3. Call initialize() on the constructed object
The job of initialize() is to define all the initial properties.
Doing it in a second pass guarantees that the Object has a valid Shape
and can find its own GlobalObject.
More work towards supporting multiple global objects. Native C++ code
now get a GlobalObject& and don't have to ask the Interpreter for it.
I've added macros for declaring and defining native callbacks since
this was pretty tedious and this makes it easier next time we want to
change any of these signatures.
Get rid of the weird old signature:
- int StringType::to_int(bool& ok) const
And replace it with sensible new signature:
- Optional<int> StringType::to_int() const
Objects should get the GlobalObject from themselves instead. However,
it's not yet available during construction so this only switches code
that happens after construction.
To support multiple global objects, Interpreter needs to stop holding
on to "the" global object and let each object graph own their global.
We need to move towards supporting multiple global objects, which will
be a large refactoring. To keep it manageable, let's do it in steps,
starting with giving Object a way to find the GlobalObject it lives
inside by asking its Shape for it.
This makes them a bit more compact and improves consistency as
to_boolean() and to_number() already use this style as well.
Also re-order the types to match the table in the spec document.
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!
Includes all traps except the following: [[Call]], [[Construct]],
[[OwnPropertyKeys]].
An important implication of this commit is that any call to any virtual
Object method has the potential to throw an exception. These methods
were not checked in this commit -- a future commit will have to protect
these various method calls throughout the codebase.
This new struct is now returned from get_own_property_descriptor. To
preserve the old functionality of returning an object, there is now a
get_own_property_descriptor_object method, for use in
{Object,Reflect}.getOwnPropertyDescriptor().
This change will be useful for the implementation of Proxies, which do a
lot of descriptor checks. We want to avoid as many object gets and puts
as possible.
When calling Object.defineProperty, there is now a difference between
omitting a descriptor attribute and specifying that it is false. For
example, "{}" and "{ configurable: false }" will have different
attribute values.
Object::set_prototype() now returns a boolean indicating success.
Setting the prototype to an identical object is always considered
successful, even if the object is non-extensible.
.. and make travis run it.
I renamed check-license-headers.sh to check-style.sh and expanded it so
that it now also checks for the presence of "#pragma once" in .h files.
It also checks the presence of a (single) blank line above and below the
"#pragma once" line.
I also added "#pragma once" to all the files that need it: even the ones
we are not check.
I also added/removed blank lines in order to make the script not fail.
I also ran clang-format on the files I modified.
Previously, the relational operators where casting any value to double
and comparing the results according to C++ semantics.
This patch makes the relational operators in JS behave according to the
standard specification.
Since we don't have BigInt yet, the implementation doesn't take it into
account.
Moved PreferredType from Object to Value. Value::to_primitive now
passes preferred_type to Object::to_primitive.
This patch adds an IndexedProperties object for storing indexed
properties within an Object. This accomplishes two goals: indexed
properties now have an associated descriptor, and objects now gracefully
handle sparse properties.
The IndexedProperties class is a wrapper around two other classes, one
for simple indexed properties storage, and one for general indexed
property storage. Simple indexed property storage is the common-case,
and is simply a vector of properties which all have attributes of
default_attributes (writable, enumerable, and configurable).
General indexed property storage is for a collection of indexed
properties where EITHER one or more properties have attributes other
than default_attributes OR there is a property with a large index (in
particular, large is '200' or higher).
Indexed properties are now treated relatively the same as storage within
the various Object methods. Additionally, there is a custom iterator
class for IndexedProperties which makes iteration easy. The iterator
skips empty values by default, but can be configured otherwise.
Likewise, it evaluates getters by default, but can be set not to.
Previously, the Object class had many different types of functions for
each action. For example: get_by_index, get(PropertyName),
get(FlyString). This is a bit verbose, so these methods have been
shortened to simply use the PropertyName structure. The methods then
internally call _by_index if necessary. Note that the _by_index
have been made private to enforce this change.
Secondly, a clear distinction has been made between "putting" and
"defining" an object property. "Putting" should mean modifying a
(potentially) already existing property. This is akin to doing "a.b =
'foo'".
This implies two things about put operations:
- They will search the prototype chain for setters and call them, if
necessary.
- If no property exists with a particular key, the put operation
should create a new property with the default attributes
(configurable, writable, and enumerable).
In contrast, "defining" a property should completely overwrite any
existing value without calling setters (if that property is
configurable, of course).
Thus, all of the many JS objects have had any "put" calls changed to
"define_property" calls. Additionally, "put_native_function" and
"put_native_property" have had their "put" replaced with "define".
Finally, "put_own_property" has been made private, as all necessary
functionality should be exposed with the put and define_property
methods.
This patch adds `Array.prototype.reduceRight()` method to LibJS Runtime. The implementation is (to my best knowledge) conformant to https://tc39.es/ecma262/#sec-array.prototype.reduceright.
Short test in `LibJS/Tests/Array.prototype-generic-functions.js` demonstrates that the function can be applied to other objects besides `Array`.
This changes Accessor's m_{getter,setter} from Value to Function* which
seems like a better API to me - a getter/setter must either be a
function or missing, and the creation of an accessor with other values
must be prevented by the parser and Object.defineProperty() anyway.
Also add Accessor::set_{getter,setter}() so we can reuse an already
created accessor when evaluating an ObjectExpression with getter/setter
shorthand syntax.
This patch adds `Array.prototype.reduce()` method to LibJS Runtime.
The implementation is (to my best knowledge) comformant to ECMA262.
The test `Array.prototype-generic-functions.js` demonstrates that the
function can be applied to other objects besides `Array`.
Let's treat it as zero like the ECMAScript spec does in toInteger().
That way we can use to_i32() and don't have to care about weird input
input values where a number is expected, i.e.
"foo".charAt() === "f"
"foo".charAt("bar") === "f"
"foo".charAt(0) === "f"
This patch adds a GetterSetterPair object. Values can now store pointers
to objects of this type. These objects are created when using
Object.defineProperty and providing an accessor descriptor.
This saves us both a bit of time and accuracy, as Serenity's strtod()
still is a little bit off sometimes - and stringifying the result and
parsing it again just increases that offset.
As these parameter-less overloads don't change the value's type and
just assume Type::Number, naming them as_i32() and as_size_t() is more
appropriate.
This patch is unfortunately rather large and might make some things feel
bloated, but it is necessary to fix a few flaws in LibJS, primarily
blindly coercing values to numbers without exception checks - i.e.
interpreter.argument(0).to_i32(); // can fail!!!
Some examples where the interpreter would actually crash:
var o = { toString: () => { throw Error() } };
+o;
o - 1;
"foo".charAt(o);
"bar".repeat(o);
To fix this, we now have the following...
to_double(Interpreter&)
to_i32()
to_i32(Interpreter&)
to_size_t()
to_size_t(Interpreter&)
...and a whole lot of exception checking.
There's intentionally no to_double(), use as_double() directly instead.
This way we still can use these convenient utility functions but don't
need to check for exceptions if we are sure the value already is a
number.
Fixes#2267.
Passing a Heap& to it only to then call interpreter() on that is weird.
Let's just give it the Interpreter& directly, like some of the other
to_something() functions.
This commit adds the following classes: SymbolObject, SymbolConstructor,
SymbolPrototype, and Symbol. This commit does not introduce any
new functionality to the Object class, so they cannot be used as
property keys in objects.
There are now two API's on Value:
- Value::to_string(Interpreter&) -- may throw.
- Value::to_string_without_side_effects() -- will never throw.
These are some pretty big sweeping changes, so it's possible that I did
some part the wrong way. We'll work it out as we go. :^)
Fixes#2123.
Rather than printing them to stderr directly the parser now keeps a
Vector<Error>, which allows the "owner" of the parser to consume them
individually after parsing.
The Error struct has a message, line number, column number and a
to_string() helper function to format this information into a meaningful
error message.
The Function() constructor will now include an error message when
throwing a SyntaxError.
The ECMAScript spec defines multiple equality operations which are used
all over the spec; this patch introduces them. Of course, the two
primary equality operations are AbtractEquals ('==') and StrictEquals
('==='), which have been renamed to 'abstract_eq' and 'strict_eq' in
this patch.
In support of the two operations mentioned above, the following have
also been added: SameValue, SameValueZero, and SameValueNonNumeric.
These are important to have, because they are used elsewhere in the spec
aside from the two primary equality comparisons.
"[Function.length is] the number of formal parameters. This number
excludes the rest parameter and only includes parameters before
the first one with a default value." - MDN
Console methods are now Value(void) functions.
JavaScript functions in the JavaScript ConsoleObject are now implemented
as simple wrappers around Console methods.
This will make it possible for LibJS users to easily override the
default behaviour of JS console functions (even their return value!)
once we add a way to override Console behaviour.
Adds the ability for function arguments to have default values. This
works for standard functions as well as arrow functions. Default values
are not printed in a <function>.toString() call, as nodes cannot print
their source string representation.
A ConsoleMessage is a struct cointaining:
* AK::String text; represents the text of the message sent
to the console.
* ConsoleMessageKind kind; represents the kind of JS `console` function
from which the message was sent.
Now, Javascript `console` functions only send a ConsoleMessage to the
Interpreter's Console instead of printing text directly to stdout.
The Console then stores the recived ConsoleMessage in
Console::m_messages; the Console does not print to stdout by default.
You can set Console::on_new_message to a void(ConsoleMessage&); this
function will get call everytime a new message is added to the Console's
messages and can be used, for example, to print ConsoleMessages to
stdout or to color the output based on the kind of ConsoleMessage.
In this patch, I also:
* Re-implement all the previously implemented functions in the
JavaScript ConsoleObject, as wrappers around Console functions
that add new message to the Console.
* Implement console.clear() like so:
- m_messages get cleared;
- a new_message with kind set ConsoleMessageKind::Clear gets added
to m_messages, its text is an empty AK::String;
* Give credit to linusg in Console.cpp since I used his
console.trace() algorithm in Console::trace().
I think that having this abstration will help us in the implementation
of a browser console or a JS debugger. We could also add more MetaData
to ConsoleMessage, e.g. Object IDs of the arguments passed to console
functions in order to make hyperlinks, Timestamps, ecc.; which could be
interesting to see.
This will also help in implementing a `/bin/js` option to make, for
example, return a ConsoleMessageWrapper to console functions instead of
undefined. This will be useful to make tests for functions like
console.count() and console.countClear(). :^)
I.e. they don't require the |this| value to be a string object and
"can be transferred to other kinds of objects for use as a method" as
the spec describes it.
This commit introduces a way to get an object's own properties in the
correct order. The "correct order" for JS object properties is first all
array-like index properties (numeric keys) sorted by insertion order,
followed by all string properties sorted by insertion order.
Objects also now print correctly in the repl! Before this commit:
courage ~/js-tests $ js
> ({ foo: 1, bar: 2, baz: 3 })
{ bar: 2, foo: 1, baz: 3 }
After:
courage ~/js-tests $ js
> ({ foo: 1, bar: 2, baz: 3 })
{ foo: 1, bar: 2, baz: 3 }
Now that Array.prototype.join() is producing the correct results we
can remove the separate code path for arrays in Value::to_number()
and treat them like all other objects - using to_primitive() with
number as the preferred type and then calling to_number() on the
result.
This is how the spec descibes it.
This also means we don't crash anymore when trying to coerce
[<empty>] to a number - it now does the following:
[<empty>] - to string - "" - to number - 0
[<empty>, <empty>] - to string - "," - to number - NaN
Currently we would create an empty array of size 0 and appening results
of the callback function while skipping empty values.
This is incorrect, we should be initializing a full array of the correct
size beforehand and then inserting the results while still skipping
empty values.
Wrong: new Array(5).map(() => {}) // []
Right: new Array(5).map(() => {}) // [<empty> * 5]