In this patch only top level and not the more complicated for loop using
statements are supported. Also, as noted in the latest meeting of tc39
async parts of the spec are not stage 3 thus not included.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This was state only used by the parser to output an error with
appropriate location. This shrinks the size of ObjectExpression from
120 bytes down to just 56. This saves roughly 2.5 MiB when loading
twitter.
Before this change, each AST node had a 64-byte SourceRange member.
This SourceRange had the following layout:
filename: StringView (16 bytes)
start: Position (24 bytes)
end: Position (24 bytes)
The Position structs have { line, column, offset }, all members size_t.
To reduce memory consumption, AST nodes now only store the following:
source_code: NonnullRefPtr<SourceCode> (8 bytes)
start_offset: u32 (4 bytes)
end_offset: u32 (4 bytes)
SourceCode is a new ref-counted data structure that keeps the filename
and original parsed source code in a single location, and all AST nodes
have a pointer to it.
The start_offset and end_offset can be turned into (line, column) when
necessary by calling SourceCode::range_from_offsets(). This will walk
the source code string and compute line/column numbers on the fly, so
it's not necessarily fast, but it should be rare since this information
is primarily used for diagnostics and exception stack traces.
With this, ASTNode shrinks from 80 bytes to 32 bytes. This gives us a
~23% reduction in memory usage when loading twitter.com/awesomekling
(330 MiB before, 253 MiB after!) :^)
This requires a special case with names as the default function is
supposed to have a unique name ("*default*" in our case) but when
checked should have name "default".
Since tagged template literals can inspect the raw string it is not a
syntax error to have invalid escapes. However the cooked value should be
`undefined`.
We accomplish this by tracking whether parse_string_literal
fails and then using a NullLiteral (since UndefinedLiteral is not a
thing) and finally converting null in tagged template execution to
undefined.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
While adding spec comments to PerformEval, I noticed we were missing
multiple steps.
Namely, these were:
- Checking if the host will allow us to compile the string
(allowing LibWeb to perform CSP for eval)
- The parser's initial state depending on the environment around us
on direct eval:
- Allowing new.target via eval in functions
- Allowing super calls and super properties via eval in classes
- Disallowing the use of the arguments object in class field
initializers at eval's parse time
- Setting ScriptOrModule of eval's execution context
The spec allows us to apply the additional parsing steps in any order.
The method I have gone with is passing in a struct to the parser's
constructor, which overrides the parser's initial state to (dis)allow
the things stated above from the get-go.
The same expression is not allowed to contain both the
logical && and || operators, and the coalescing ?? operator.
This patch changes how "forbidden" tokens are handled, using a
finite set instead of an Vector. This supports much more efficient
merging of the forbidden tokens when propagating forward, and
allowing the return of forbidden tokens to parent contexts.
This patch makes check_identifier_name_for_assignment_validity()
take a FlyString instead of a StringView. We then exploit this by
passing FlyString in more places via flystring_value().
This gives a ~1% speedup when parsing the largest Discord JS file.
The big changes are:
- Allow strings as Module{Export, Import}Name
- Properly track declarations in default export statements
However, the spec is a little strange in that it allows function and
class declarations without a name in default export statements.
This is quite hard to fully implement without rewriting more of the
parser so for now this behavior is emulated by faking things with
function and class expressions. See the comments in
parse_export_statement for details on the hacks and where it goes wrong.
The three major changes are:
- Parsing parameters, the function body, and then the full assembled
function source all separately. This is required by the spec, as
function parameters and body must be valid each on their own, which
cannot be guaranteed if we only ever parse the full function.
- Returning an ECMAScriptFunctionObject instead of a FunctionExpression
that needs to be evaluated separately. This vastly simplifies the
{Async,AsyncGenerator,Generator,}Function constructor implementations.
Drop '_node' from the function name accordingly.
- The prototype is now determined via GetPrototypeFromConstructor and
passed to OrdinaryFunctionCreate.
This includes:
- Parsing proper LabelledStatements with try_parse_labelled_statement()
- Removing LabelableStatement
- Implementing the LoopEvaluation semantics via loop_evaluation() in
each IterationStatement subclass; and IterationStatement evaluation
via {For,ForIn,ForOf,ForAwaitOf,While,DoWhile}Statement::execute()
- Updating ReturnStatement, BreakStatement and ContinueStatement to
return the appropriate completion types
- Basically reimplementing TryStatement and SwitchStatement according to
the spec, using completions
- Honoring result completion types in AsyncBlockStart and
OrdinaryCallEvaluateBody
- Removing any uses of the VM unwind mechanism - most importantly,
VM::throw_exception() now exclusively sets an exception and no longer
triggers any unwinding mechanism.
However, we already did a good job updating all of LibWeb and userland
applications to not use it, and the few remaining uses elsewhere don't
rely on unwinding AFAICT.
This allows us to only perform checks like export bindings existing only
for modules. Also this makes it easier to set strict and other state
variables with TemporaryChanges.
This commit adds support for the most bare bones version of async
functions, support for async generator functions, async arrow functions
and await expressions are TODO.
This saves having to save and load the parser state.
This could give an incorrect token in some cases where the parser
communicates to the lexer. However this is not applicable in any of the
current usages and this would require one to parse the current token
as normal which is exactly what you don't want to do in that scenario.
We now propagate this flag to FunctionDeclaration, and then also into
ECMAScriptFunctionObject.
This will be used to disable optimizations that aren't safe in the
presence of direct eval().
This gives FunctionNode a "might need arguments object" boolean flag and
sets it based on the simplest possible heuristic for this: if we
encounter an identifier called "arguments" or "eval" up to the next
(nested) function declaration or expression, we won't need an arguments
object. Otherwise, we *might* need one - the final decision is made in
the FunctionDeclarationInstantiation AO.
Now, this is obviously not perfect. Even if you avoid eval, something
like `foo.arguments` will still trigger a false positive - but it's a
start and already massively cuts down on needlessly allocated objects,
especially in real-world code that is often minified, and so a full
"arguments" identifier will be an actual arguments object more often
than not.
To illustrate the actual impact of this change, here's the number of
allocated arguments objects during a full test-js run:
Before:
- Unmapped arguments objects: 78765
- Mapped arguments objects: 2455
After:
- Unmapped arguments objects: 18
- Mapped arguments objects: 37
This results in a ~5% speedup of test-js on my Linux host machine, and
about 3.5% on i686 Serenity in QEMU (warm runs, average of 5).
The following microbenchmark (calling an empty function 1M times) runs
25% faster on Linux and 45% on Serenity:
function foo() {}
for (var i = 0; i < 1_000_000; ++i)
foo();
test262 reports no changes in either direction, apart from a speedup :^)
Before this we used an ad-hoc combination of references and 'variables'
stored in a hashmap. This worked in most cases but is not spec like.
Additionally hoisting, dynamically naming functions and scope analysis
was not done properly.
This patch fixes all of that by:
- Implement BindingInitialization for destructuring assignment.
- Implementing a new ScopePusher which tracks the lexical and var
scoped declarations. This hoists functions to the top level if no
lexical declaration name overlaps. Furthermore we do checking of
redeclarations in the ScopePusher now requiring less checks all over
the place.
- Add methods for parsing the directives and statement lists instead
of having that code duplicated in multiple places. This allows
declarations to pushed to the appropriate scope more easily.
- Remove the non spec way of storing 'variables' in
DeclarativeEnvironment and make Reference follow the spec instead of
checking both the bindings and 'variables'.
- Remove all scoping related things from the Interpreter. And instead
use environments as specified by the spec. This also includes fixing
that NativeFunctions did not produce a valid FunctionEnvironment
which could cause issues with callbacks and eval. All
FunctionObjects now have a valid NewFunctionEnvironment
implementation.
- Remove execute_statements from Interpreter and instead use
ASTNode::execute everywhere this simplifies AST.cpp as you no longer
need to worry about which method to call.
- Make ScopeNodes setup their own environment. This uses four
different methods specified by the spec
{Block, Function, Eval, Global}DeclarationInstantiation with the
annexB extensions.
- Implement and use NamedEvaluation where specified.
Additionally there are fixes to things exposed by these changes to eval,
{for, for-in, for-of} loops and assignment.
Finally it also fixes some tests in test-js which where passing before
but not now that we have correct behavior :^).
Since there are only a number of statements where labels can actually be
used we now also only store labels when necessary.
Also now tracks the first continue usage of a label since this might not
be valid but that can only be determined after we have parsed the
statement.
Also ensures the correct error does not get wiped by load_state.
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.