beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-25 17:10:23 +00:00

Author	SHA1	Message	Date
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00
Linus Groh	3dbf4c62b0	LibJS: Use GenericLexer for Token::string_value() This is, and I can't stress this enough, a lot better than all the manual bounds checking and indexing that was going on before. Also fixes a small bug where "\u{}" wouldn't get rejected as invalid unicode escape sequence.	2020-10-29 11:52:31 +01:00
Linus Groh	b5bd05b717	LibJS: Don't parse numeric literal containing 8 or 9 as octal If the value has a leading zero (allowed in non-strict mode) but contains the digits 8 or 9 it can't be an octal number.	2020-10-28 21:11:32 +01:00
Linus Groh	66e315959d	LibJS: Allow all line terminators to be used for line continuations	2020-10-25 19:45:47 +01:00
Marcin Gasperowicz	e5ddcadd3c	LibJS: Parse line continuations in string literals properly Newlines after line continuation were inserted into the string literals. This patch makes the parser ignore the newlines after \ and also makes it so that "use strict" containing a line continuation is not a valid "use strict".	2020-10-25 15:16:47 +01:00
Linus Groh	4fb96afafc	LibJS: Support LegacyOctalEscapeSequence in string literals https://tc39.es/ecma262/#sec-additional-syntax-string-literals The syntax and semantics of 11.8.4 is extended as follows except that this extension is not allowed for strict mode code: Syntax EscapeSequence:: CharacterEscapeSequence LegacyOctalEscapeSequence NonOctalDecimalEscapeSequence HexEscapeSequence UnicodeEscapeSequence LegacyOctalEscapeSequence:: OctalDigit [lookahead ∉ OctalDigit] ZeroToThree OctalDigit [lookahead ∉ OctalDigit] FourToSeven OctalDigit ZeroToThree OctalDigit OctalDigit ZeroToThree :: one of 0 1 2 3 FourToSeven :: one of 4 5 6 7 NonOctalDecimalEscapeSequence :: one of 8 9 This definition of EscapeSequence is not used in strict mode or when parsing TemplateCharacter. Note It is possible for string literals to precede a Use Strict Directive that places the enclosing code in strict mode, and implementations must take care to not use this extended definition of EscapeSequence with such literals. For example, attempting to parse the following source text must fail: function invalid() { "\7"; "use strict"; }	2020-10-24 16:34:01 +02:00
Linus Groh	15642874f3	LibJS: Support all line terminators (LF, CR, LS, PS) https://tc39.es/ecma262/#sec-line-terminators	2020-10-22 10:06:30 +02:00
Linus Groh	e80217a746	LibJS: Unify syntax highlighting So far we have three different syntax highlighters for LibJS: - js's Line::Editor stylization - JS::MarkupGenerator - GUI::JSSyntaxHighlighter This not only caused repetition of most token types in each highlighter but also a lot of inconsistency regarding the styling of certain tokens: - JSSyntaxHighlighter was considering TokenType::Period to be an operator whereas MarkupGenerator categorized it as punctuation. - MarkupGenerator was considering TokenType::{Break,Case,Continue, Default,Switch,With} control keywords whereas JSSyntaxHighlighter just disregarded them - MarkupGenerator considered some future reserved keywords invalid and others not. JSSyntaxHighlighter and js disregarded most Adding a new token type meant adding it to ENUMERATE_JS_TOKENS as well as each individual highlighter's switch/case construct. I added a TokenCategory enum, and each TokenType is now associated to a certain category, which the syntax highlighters then can use for styling rather than operating on the token type directly. This also makes changing a token's category everywhere easier, should we need to do that (e.g. I decided to make TokenType::{Period,QuestionMarkPeriod} TokenCategory::Operator for now, but we might want to change them to Punctuation.	2020-10-04 23:41:31 +02:00
Linus Groh	d1d9545875	LibJS: Add missing reserved words to Token::is_identifier_name() This is being used in match_identifier_name(), for example when parsing property keys - the list was incomplete, likely as some token types were added later, leading to some unexpected syntax errors: > var e = {}; undefined > e.extends = "a"; e.extends = "a"; ^ Uncaught exception: [SyntaxError]: Unexpected token Extends. Expected IdentifierName (line: 1, column: 3) Fixes #3128.	2020-08-14 10:58:51 +02:00
Nico Weber	ce95628b7f	Unicode: Try s/codepoint/code_point/g again This time, without trailing 's'. Ran: git grep -l 'codepoint' \| xargs sed -ie 's/codepoint/code_point/g	2020-08-05 22:33:42 +02:00
Nico Weber	19ac1f6368	Revert "Unicode: s/codepoint/code_point/g" This reverts commit `ea9ac3155d`. It replaced "codepoint" with "code_points", not "code_point".	2020-08-05 22:33:42 +02:00
Andreas Kling	ea9ac3155d	Unicode: s/codepoint/code_point/g Unicode calls them "code points" so let's follow their style.	2020-08-03 19:06:41 +02:00
Nico Weber	9e32ad6c99	LibJS: Fix \x escapes of bytes with high bit set With this, typing `"\xff"` into Browser's console no longer makes the app crash. While here, also make the \u handler call append_codepoint() instead of calling an overload where it's not immediately clear which overload is getting called. This has no behavior change.	2020-07-22 19:21:35 +02:00
Sergey Bugaev	1274c244d5	LibJS: Fix out-of-bounds read when parsing escape sequences We cannot look at i+1'th character until we verify it's there.	2020-06-01 17:37:44 +02:00
Matthew Olsson	e415dd4e9c	LibJS: Handle hex and unicode escape sequences in string literals Introduces the following syntax: '\x55' '\u26a0' '\u{1f41e}'	2020-05-18 17:58:17 +02:00
mattco98	adb4accab3	LibJS: Add template literals Adds fully functioning template literals. Because template literals contain expressions, most of the work has to be done in the Lexer rather than the Parser. And because of the complexity of template literals (expressions, nesting, escapes, etc), the Lexer needs to have some template-related state. When entering a new template literal, a TemplateLiteralStart token is emitted. When inside a literal, all text will be parsed up until a '${' or '`' (or EOF, but that's a syntax error) is seen, and then a TemplateLiteralExprStart token is emitted. At this point, the Lexer proceeds as normal, however it keeps track of the number of opening and closing curly braces it has seen in order to determine the close of the expression. Once it finds a matching curly brace for the '${', a TemplateLiteralExprEnd token is emitted and the state is updated accordingly. When the Lexer is inside of a template literal, but not an expression, and sees a '`', this must be the closing grave: a TemplateLiteralEnd token is emitted. The state required to correctly parse template strings consists of a vector (for nesting) of two pieces of information: whether or not we are in a template expression (as opposed to a template string); and the count of the number of unmatched open curly braces we have seen (only applicable if the Lexer is currently in a template expression). TODO: Add support for template literal newlines in the JS REPL (this will cause a syntax error currently): > `foo > bar` 'foo bar'	2020-05-04 16:46:31 +02:00
Linus Groh	95b51e857d	LibJS: Add TokenType::TemplateLiteral This is required for template literals - we're not quite there yet, but at least the parser can now tell us when this token is encountered - currently this yields "Unexpected token Invalid". Not really helpful. The character is a "backtick", but as we already have TokenType::{StringLiteral,RegexLiteral} this seemed like a fitting name. This also enables syntax highlighting for template literals in the js REPL and LibGUI's JSSyntaxHighlighter.	2020-04-24 11:18:57 +02:00
Stephan Unverwerth	bf5b251684	LibJS: Allow reserved words as keys in object expressions.	2020-04-18 22:23:20 +02:00
Stephan Unverwerth	500f6d9e3a	LibJS: Add numeric literal parsing for different bases and exponents	2020-04-05 16:01:22 +02:00
Andreas Kling	a860a3f793	LibJS: Hack the lexer to allow numbers with decimals This is very hackish and should definitely be improved. :^)	2020-04-04 23:13:48 +02:00
Andreas Kling	a1c718e387	LibJS: Use some macro magic to avoid duplicating all the token types	2020-03-30 13:11:07 +02:00
Andreas Kling	1923051c5b	LibJS: Lexer and parser support for "switch" statements	2020-03-29 15:03:58 +02:00
0xtechnobabble	bc002f807a	LibJS: Parse object expressions	2020-03-21 10:08:58 +01:00
0xtechnobabble	cfd710eb31	LibJS: Implement null and undefined literals	2020-03-16 13:42:13 +01:00
Stephan Unverwerth	3389021291	LibJS: Unescape strings in Token::string_value()	2020-03-14 16:00:28 +01:00
Andreas Kling	f94099f796	LibJS: Strip double-quote characters from StringLiteral tokens This is very hackish since I'm just doing it to make progress on something else. :^)	2020-03-14 12:40:06 +01:00
Stephan Unverwerth	c0e6234219	LibJS: Lex single quote strings, escaped chars and unterminated strings	2020-03-14 12:13:53 +01:00
Oriko	e273203d27	LibJS: Add missing tokens to name()	2020-03-14 11:30:31 +01:00
Stephan Unverwerth	15d5b2d29e	LibJS: Add operator precedence parsing Obey precedence and associativity rules when parsing expressions with chained operators.	2020-03-14 00:11:24 +01:00
Stephan Unverwerth	f3a9eba987	LibJS: Add Javascript lexer and parser This adds a basic Javascript lexer and parser. It can parse the currently existing demo programs. More work needs to be done to turn it into a complete parser than can parse arbitrary JS Code. The lexer outputs tokens with preceeding whitespace and comments in the trivia member. This should allow us to generate the exact source code by concatenating the generated tokens. The parser is written in a way that it always returns a complete syntax tree. Error conditions are represented as nodes in the tree. This simplifies the code and allows it to be used as an early stage parser, e.g for parsing JS documents in an IDE while editing the source code.:	2020-03-12 09:25:49 +01:00

31 commits