Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
This adds a new MetaProperty AST node which will be used for
'new.target' and 'import.meta' meta properties. The parser now
distinguishes between "in function context" and "in arrow function
context" (which is required for this).
When encountering TokenType::New we will attempt to parse it as meta
property and resort to regular new expression parsing if that fails,
much like the parsing of labelled statements.
This is a bit nicer for two reasons:
- The absence of line number/column information isn't based on 'values
are zero' anymore but on Optional's value
- When reporting syntax errors with position information other than the
current token's position we had to store line and column ourselves,
like this:
auto foo_start_line = m_parser_state.m_current_token.line_number();
auto foo_start_column = m_parser_state.m_current_token.line_column();
...
syntax_error("...", foo_start_line, foo_start_column);
Which now becomes:
auto foo_start= position();
...
syntax_error("...", foo_start);
This makes it easier to report correct positions for syntax errors
that only emerge a few tokens later :^)
By having the "is this a use strict directive?" logic in
parse_string_literal() we would apply it to *any* string literal, which
is incorrect and would lead to false positives - e.g.:
"use strict" + 1
`"use strict"`
"\123"; ({"use strict": ...})
Relevant part from the spec which is now implemented properly:
[...] and where each ExpressionStatement in the sequence consists
entirely of a StringLiteral token [...]
I also got rid of UseStrictDirectiveState which is not needed anymore.
Fixes#3903.
- A regular function can have duplicate parameters except in strict mode
or if its parameter list is not "simple" (has a default or rest
parameter)
- An arrow function can never have duplicate parameters
Compared to other engines I opted for more useful syntax error messages
than a generic "duplicate parameter name not allowed in this context":
"use strict"; function test(foo, foo) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in strict mode (line: 1, column: 34)
function test(foo, foo = 1) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with default parameter (line: 1, column: 20)
function test(foo, ...foo) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with rest parameter (line: 1, column: 23)
(foo, foo) => {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in arrow function (line: 1, column: 7)
https://tc39.es/ecma262/#sec-additional-syntax-string-literals
The syntax and semantics of 11.8.4 is extended as follows except that
this extension is not allowed for strict mode code:
Syntax
EscapeSequence::
CharacterEscapeSequence
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
LegacyOctalEscapeSequence::
OctalDigit [lookahead ∉ OctalDigit]
ZeroToThree OctalDigit [lookahead ∉ OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit
ZeroToThree :: one of
0 1 2 3
FourToSeven :: one of
4 5 6 7
NonOctalDecimalEscapeSequence :: one of
8 9
This definition of EscapeSequence is not used in strict mode or when
parsing TemplateCharacter.
Note
It is possible for string literals to precede a Use Strict Directive
that places the enclosing code in strict mode, and implementations must
take care to not use this extended definition of EscapeSequence with
such literals. For example, attempting to parse the following source
text must fail:
function invalid() { "\7"; "use strict"; }
This separates matching/parsing of statements and declarations and
fixes a few edge cases where the parser would incorrectly accept a
declaration where only a statement is allowed - for example:
if (foo) const a = 1;
for (var bar;;) function b() {}
while (baz) class c {}
This allows us to provide better error messages as we can point the
syntax error location to the exact first invalid parameter instead of
always the end of the function within a object literal or class
definition.
Before this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Object setter property must have one argument (line: 1, column: 28)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Class setter method must have one argument (line: 1, column: 26)
After this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 23)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 21)
The only possible downside of this change is that class getters/setters
and functions in objects are not distinguished in the message anymore -
I don't think that's important though, and classes are (mostly) just
syntactic sugar anyway.
I'm about to add even more options and a bunch of unnamed true/false
arguments is really not helpful. Let's make this a single parse options
parameter using bit flags.
This simplifies try_parse_arrow_function_expression() and fixes a few
cases that should not produce an arrow function AST but did:
(a,,) => {}
(a b) => {}
(a ...b) => {}
(...b a) => {}
The new parsing logic checks whether parens are expected and uses
parse_function_parameters() if so, rolling back if a new syntax error
occurs during that. Otherwise it's just an identifier in which case we
parse the single parameter ourselves.
'continue' is no longer allowed outside of a loop, and an unlabeled
'break' is not longer allowed outside of a loop or switch statement.
Labeled 'break' statements are still allowed everywhere, even if the
label does not exist.
The check for invalid lhs and assignment to eval/arguments in strict
mode should happen for all kinds of assignment expressions, not just
AssignmentOp::Assignment.
Since blocks can't be strict by themselves, it makes no sense for them
to store whether or not they are strict. Strict-ness is now stored in
the Program and FunctionNode ASTNodes. Fixes issue #3641
literal methods; add EnvrionmentRecord fields and methods to
LexicalEnvironment
Adding EnvrionmentRecord's fields and methods lets us throw an exception
when |this| is not initialized, which occurs when the super constructor
in a derived class has not yet been called, or when |this| has already
been initialized (the super constructor was already called).
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!
This patch adds function declaration hoisting. The mechanism
is similar to var hoisting. Hoisted function declarations are to be put
before the hoisted var declarations, hence they have to be treated
separately.
This rewrite drastically increases the accuracy of object literals.
Additionally, an "assertIsSyntaxError" function has been added to
test-common.js to assist in testing syntax errors.
Adds the ability for a scope (either a function or the entire program)
to be in strict mode. Scopes default to non-strict mode.
There are two ways to determine the strict-ness of the JS engine:
1. In the parser, this can be accessed with the parser_state variable
m_is_strict_mode boolean. If true, the Parser is currently parsing in
strict mode. This is done so that the Parser can generate syntax
errors at parse time, which is required in some cases.
2. With Interpreter.is_strict_mode(). This allows strict mode checking
at runtime as opposed to compile time.
Additionally, in order to test this, a global isStrictMode() function
has been added to the JS ReplObject under the test-mode flag.
This util function on the Error struct will take the source and then
returns a string like this based on line and column information it has:
foo bar
^
Which can be shown in the repl for syntax errors :^)
Rather than printing them to stderr directly the parser now keeps a
Vector<Error>, which allows the "owner" of the parser to consume them
individually after parsing.
The Error struct has a message, line number, column number and a
to_string() helper function to format this information into a meaningful
error message.
The Function() constructor will now include an error message when
throwing a SyntaxError.
Giving the lexer the ability to generate errors adds unnecessary
complexity - also it only calls its syntax_error() function in one place
anyway ("unterminated string literal"). But since the lexer *also* emits
tokens like Eof or UnterminatedStringLiteral, it should be up to the
consumer of these tokens to decide what to do.
Also remove the option to not print errors to stderr as that's not
relevant anymore.
Adds fully functioning template literals. Because template literals
contain expressions, most of the work has to be done in the Lexer rather
than the Parser. And because of the complexity of template literals
(expressions, nesting, escapes, etc), the Lexer needs to have some
template-related state.
When entering a new template literal, a TemplateLiteralStart token is
emitted. When inside a literal, all text will be parsed up until a '${'
or '`' (or EOF, but that's a syntax error) is seen, and then a
TemplateLiteralExprStart token is emitted. At this point, the Lexer
proceeds as normal, however it keeps track of the number of opening
and closing curly braces it has seen in order to determine the close
of the expression. Once it finds a matching curly brace for the '${',
a TemplateLiteralExprEnd token is emitted and the state is updated
accordingly.
When the Lexer is inside of a template literal, but not an expression,
and sees a '`', this must be the closing grave: a TemplateLiteralEnd
token is emitted.
The state required to correctly parse template strings consists of a
vector (for nesting) of two pieces of information: whether or not we
are in a template expression (as opposed to a template string); and
the count of the number of unmatched open curly braces we have seen
(only applicable if the Lexer is currently in a template expression).
TODO: Add support for template literal newlines in the JS REPL (this will
cause a syntax error currently):
> `foo
> bar`
'foo
bar'
Adds the ability for function arguments to have default values. This
works for standard functions as well as arrow functions. Default values
are not printed in a <function>.toString() call, as nodes cannot print
their source string representation.
Instead of having fprintf()s all over the place we can now use
syntax_error("message") or syntax_error("message", line, column).
This takes care of a consistent format, appending a newline and getting
the line number and column of the current token if the last two params
are omitted.
"var" declarations are hoisted to the nearest function scope, while
"let" and "const" are hoisted to the nearest block scope.
This is done by the parser, which keeps two scope stacks, one stack
for the current var scope and one for the current let/const scope.
When the interpreter enters a scope, we walk all of the declarations
and insert them into the variable environment.
We don't support the temporal dead zone for let/const yet.