Commit graph

31 commits

Author SHA1 Message Date
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Linus Groh
3dbf4c62b0 LibJS: Use GenericLexer for Token::string_value()
This is, and I can't stress this enough, a lot better than all the
manual bounds checking and indexing that was going on before.

Also fixes a small bug where "\u{}" wouldn't get rejected as invalid
unicode escape sequence.
2020-10-29 11:52:31 +01:00
Linus Groh
b5bd05b717 LibJS: Don't parse numeric literal containing 8 or 9 as octal
If the value has a leading zero (allowed in non-strict mode) but
contains the digits 8 or 9 it can't be an octal number.
2020-10-28 21:11:32 +01:00
Linus Groh
66e315959d LibJS: Allow all line terminators to be used for line continuations 2020-10-25 19:45:47 +01:00
Marcin Gasperowicz
e5ddcadd3c LibJS: Parse line continuations in string literals properly
Newlines after line continuation were inserted into the string 
literals. This patch makes the parser ignore the newlines after \ and
also makes it so that "use strict" containing a line continuation is 
not a valid "use strict".
2020-10-25 15:16:47 +01:00
Linus Groh
4fb96afafc LibJS: Support LegacyOctalEscapeSequence in string literals
https://tc39.es/ecma262/#sec-additional-syntax-string-literals

The syntax and semantics of 11.8.4 is extended as follows except that
this extension is not allowed for strict mode code:

Syntax

    EscapeSequence::
        CharacterEscapeSequence
        LegacyOctalEscapeSequence
        NonOctalDecimalEscapeSequence
        HexEscapeSequence
        UnicodeEscapeSequence

    LegacyOctalEscapeSequence::
        OctalDigit [lookahead ∉ OctalDigit]
        ZeroToThree OctalDigit [lookahead ∉ OctalDigit]
        FourToSeven OctalDigit
        ZeroToThree OctalDigit OctalDigit

    ZeroToThree :: one of
        0 1 2 3

    FourToSeven :: one of
        4 5 6 7

    NonOctalDecimalEscapeSequence :: one of
        8 9

This definition of EscapeSequence is not used in strict mode or when
parsing TemplateCharacter.

Note

It is possible for string literals to precede a Use Strict Directive
that places the enclosing code in strict mode, and implementations must
take care to not use this extended definition of EscapeSequence with
such literals. For example, attempting to parse the following source
text must fail:

function invalid() { "\7"; "use strict"; }
2020-10-24 16:34:01 +02:00
Linus Groh
15642874f3 LibJS: Support all line terminators (LF, CR, LS, PS)
https://tc39.es/ecma262/#sec-line-terminators
2020-10-22 10:06:30 +02:00
Linus Groh
e80217a746 LibJS: Unify syntax highlighting
So far we have three different syntax highlighters for LibJS:

- js's Line::Editor stylization
- JS::MarkupGenerator
- GUI::JSSyntaxHighlighter

This not only caused repetition of most token types in each highlighter
but also a lot of inconsistency regarding the styling of certain tokens:

- JSSyntaxHighlighter was considering TokenType::Period to be an
  operator whereas MarkupGenerator categorized it as punctuation.
- MarkupGenerator was considering TokenType::{Break,Case,Continue,
  Default,Switch,With} control keywords whereas JSSyntaxHighlighter just
  disregarded them
- MarkupGenerator considered some future reserved keywords invalid and
  others not. JSSyntaxHighlighter and js disregarded most

Adding a new token type meant adding it to ENUMERATE_JS_TOKENS as well
as each individual highlighter's switch/case construct.

I added a TokenCategory enum, and each TokenType is now associated to a
certain category, which the syntax highlighters then can use for styling
rather than operating on the token type directly. This also makes
changing a token's category everywhere easier, should we need to do that
(e.g. I decided to make TokenType::{Period,QuestionMarkPeriod}
TokenCategory::Operator for now, but we might want to change them to
Punctuation.
2020-10-04 23:41:31 +02:00
Linus Groh
d1d9545875 LibJS: Add missing reserved words to Token::is_identifier_name()
This is being used in match_identifier_name(), for example when parsing
property keys - the list was incomplete, likely as some token types were
added later, leading to some unexpected syntax errors:

> var e = {};
undefined
> e.extends = "a";
e.extends = "a";
  ^
Uncaught exception: [SyntaxError]: Unexpected token Extends. Expected IdentifierName (line: 1, column: 3)

Fixes #3128.
2020-08-14 10:58:51 +02:00
Nico Weber
ce95628b7f Unicode: Try s/codepoint/code_point/g again
This time, without trailing 's'. Ran:

    git grep -l 'codepoint' | xargs sed -ie 's/codepoint/code_point/g
2020-08-05 22:33:42 +02:00
Nico Weber
19ac1f6368 Revert "Unicode: s/codepoint/code_point/g"
This reverts commit ea9ac3155d.
It replaced "codepoint" with "code_points", not "code_point".
2020-08-05 22:33:42 +02:00
Andreas Kling
ea9ac3155d Unicode: s/codepoint/code_point/g
Unicode calls them "code points" so let's follow their style.
2020-08-03 19:06:41 +02:00
Nico Weber
9e32ad6c99 LibJS: Fix \x escapes of bytes with high bit set
With this, typing `"\xff"` into Browser's console no longer
makes the app crash.

While here, also make the \u handler call append_codepoint()
instead of calling an overload where it's not immediately clear
which overload is getting called. This has no behavior change.
2020-07-22 19:21:35 +02:00
Sergey Bugaev
1274c244d5 LibJS: Fix out-of-bounds read when parsing escape sequences
We cannot look at i+1'th character until we verify it's there.
2020-06-01 17:37:44 +02:00
Matthew Olsson
e415dd4e9c LibJS: Handle hex and unicode escape sequences in string literals
Introduces the following syntax:

'\x55'
'\u26a0'
'\u{1f41e}'
2020-05-18 17:58:17 +02:00
mattco98
adb4accab3 LibJS: Add template literals
Adds fully functioning template literals. Because template literals
contain expressions, most of the work has to be done in the Lexer rather
than the Parser. And because of the complexity of template literals
(expressions, nesting, escapes, etc), the Lexer needs to have some
template-related state.

When entering a new template literal, a TemplateLiteralStart token is
emitted. When inside a literal, all text will be parsed up until a '${'
or '`' (or EOF, but that's a syntax error) is seen, and then a
TemplateLiteralExprStart token is emitted. At this point, the Lexer
proceeds as normal, however it keeps track of the number of opening
and closing curly braces it has seen in order to determine the close
of the expression. Once it finds a matching curly brace for the '${',
a TemplateLiteralExprEnd token is emitted and the state is updated
accordingly.

When the Lexer is inside of a template literal, but not an expression,
and sees a '`', this must be the closing grave: a TemplateLiteralEnd
token is emitted.

The state required to correctly parse template strings consists of a
vector (for nesting) of two pieces of information: whether or not we
are in a template expression (as opposed to a template string); and
the count of the number of unmatched open curly braces we have seen
(only applicable if the Lexer is currently in a template expression).

TODO: Add support for template literal newlines in the JS REPL (this will
cause a syntax error currently):

    > `foo
    > bar`
    'foo
    bar'
2020-05-04 16:46:31 +02:00
Linus Groh
95b51e857d LibJS: Add TokenType::TemplateLiteral
This is required for template literals - we're not quite there yet, but at
least the parser can now tell us when this token is encountered -
currently this yields "Unexpected token Invalid". Not really helpful.

The character is a "backtick", but as we already have
TokenType::{StringLiteral,RegexLiteral} this seemed like a fitting name.

This also enables syntax highlighting for template literals in the js
REPL and LibGUI's JSSyntaxHighlighter.
2020-04-24 11:18:57 +02:00
Stephan Unverwerth
bf5b251684 LibJS: Allow reserved words as keys in object expressions. 2020-04-18 22:23:20 +02:00
Stephan Unverwerth
500f6d9e3a LibJS: Add numeric literal parsing for different bases and exponents 2020-04-05 16:01:22 +02:00
Andreas Kling
a860a3f793 LibJS: Hack the lexer to allow numbers with decimals
This is very hackish and should definitely be improved. :^)
2020-04-04 23:13:48 +02:00
Andreas Kling
a1c718e387 LibJS: Use some macro magic to avoid duplicating all the token types 2020-03-30 13:11:07 +02:00
Andreas Kling
1923051c5b LibJS: Lexer and parser support for "switch" statements 2020-03-29 15:03:58 +02:00
0xtechnobabble
bc002f807a LibJS: Parse object expressions 2020-03-21 10:08:58 +01:00
0xtechnobabble
cfd710eb31 LibJS: Implement null and undefined literals 2020-03-16 13:42:13 +01:00
Stephan Unverwerth
3389021291 LibJS: Unescape strings in Token::string_value() 2020-03-14 16:00:28 +01:00
Andreas Kling
f94099f796 LibJS: Strip double-quote characters from StringLiteral tokens
This is very hackish since I'm just doing it to make progress on
something else. :^)
2020-03-14 12:40:06 +01:00
Stephan Unverwerth
c0e6234219 LibJS: Lex single quote strings, escaped chars and unterminated strings 2020-03-14 12:13:53 +01:00
Oriko
e273203d27 LibJS: Add missing tokens to name() 2020-03-14 11:30:31 +01:00
Stephan Unverwerth
15d5b2d29e LibJS: Add operator precedence parsing
Obey precedence and associativity rules when parsing expressions
with chained operators.
2020-03-14 00:11:24 +01:00
Stephan Unverwerth
f3a9eba987 LibJS: Add Javascript lexer and parser
This adds a basic Javascript lexer and parser. It can parse the
currently existing demo programs. More work needs to be done to
turn it into a complete parser than can parse arbitrary JS Code.

The lexer outputs tokens with preceeding whitespace and comments
in the trivia member. This should allow us to generate the exact
source code by concatenating the generated tokens.

The parser is written in a way that it always returns a complete
syntax tree. Error conditions are represented as nodes in the
tree. This simplifies the code and allows it to be used as an
early stage parser, e.g for parsing JS documents in an IDE while
editing the source code.:
2020-03-12 09:25:49 +01:00