0ct0pu5/ladybird

Author	SHA1	Message	Date
Idan Horowitz	681787de76	LibJS: Add support for async functions This commit adds support for the most bare bones version of async functions, support for async generator functions, async arrow functions and await expressions are TODO.	2021-11-10 08:48:27 +00:00
davidot	eeb42c21d1	LibJS: Lex private identifiers, identifiers prefixed with a '#'	2021-10-20 23:19:17 +01:00
Nico Weber	b8dc3661ac	Libraries: Fix -Wunreachable-code warnings from clang	2021-10-08 23:33:46 +02:00
davidot	ac2c3a73b1	LibJS: Add a specific test for invalid unicode characters in the lexer Also fixes that it tried to make substrings past the end of the source if we overran the source length.	2021-10-03 17:42:05 +02:00
Luke Wilde	ae0bdda86e	LibJS: Remove read buffer overflow in Lexer::consume The position is added to manually in the line terminator and Unicode character cases. While it checks for EOF after doing so, the EOF check used `!=` instead of `<`, meaning if the position went _over_ the source length, it wouldn't think it was EOF and would cause read buffer overflows. For example, `0xea` followed by `0xfd` would cause this.	2021-10-02 17:16:09 +02:00
Andreas Kling	76bafe5542	LibJS: Always inline two hot (and trivial) functions in JS::Lexer This improves parsing time on a large chunk of JS by ~5%.	2021-09-18 19:54:24 +02:00
Andreas Kling	8bde4e94d8	LibJS: Make Lexer::s_keywords store keywords as FlyString This allows O(1) comparison against lexed keywords, since we lex to FlyString.	2021-09-18 19:54:24 +02:00
Andreas Kling	bf46845819	LibJS: Avoid a temporary AK::String when lexing already-seen identifiers By using the FlyString(StringView) constructor instead of the FlyString(String) one, we can dodge a temporary String construction. This improves parsing time on a large chunk of JS by ~1.6%.	2021-09-18 19:54:24 +02:00
Linus Groh	a50e33abe3	LibJS: Skip ID_{Start,Continue} property lookup for any ASCII characters Before this change, Lexer::is_identifier_{start,middle}() would do a Unicode property lookup via Unicode::code_point_has_property() quite frequently, especially for common characters like .,;{}[]() etc. Since these and any other ASCII characters not covered by the alpha / alphanumeric check are known to not have the ID_Start / ID_Continue (except '_', which is special-cased now) properties, we can easily avoid this function call.	2021-09-14 02:48:57 +02:00
Andreas Kling	d7578ddebb	LibJS: Share "parsed identifiers" between copied JS::Lexer instances When we save/load state in the parser, we preserve the lexer state by simply making a copy of it. This was made extremely heavy by the lexer keeping a cache of all parsed identifiers. It keeps the cache to ensure that StringViews into parsed Unicode escape sequences don't become dangling views when the Token goes out of scope. This patch solves the problem by replacing the Vector<FlyString> which was used to cache the identifiers with a ref-counted HashTable<FlyString> instead. Since the purpose of the cache is just to keep FlyStrings alive, it's fine for all Lexer instances to share the cache. And as a bonus, using a HashTable instead of a Vector replaces the O(n) accesses with O(1) ones. This makes a 1.9 MiB JavaScript file parse in 0.6s instead of 24s. :^)	2021-09-10 23:18:00 +02:00
davidot	bbddfeef4b	LibJS: Clean up token constructor and use method instead for identifiers Having two large constructor with just one parameter difference in the middle seems quite dangerous so just do it with a method.	2021-09-06 08:43:38 +01:00
Brian Gianforcaro	77d8a65498	LibJS: Fix incorrect Lexer VERIFY when parsing Unicode characters This bug was discovered via OSS fuzz, it's possible to fall through to this assert with a char_size == 1, so we need to account for that in the VERIFY(..). Repro test case can be found in the OSS fuzz bug: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=37296	2021-08-25 09:21:23 +01:00
davidot	c108c8ff24	LibJS: Disallow yield expression correctly in formal parameters And add ZERO WIDTH NO BREAK SPACE to valid whitespace.	2021-08-24 07:42:37 +01:00
davidot	7bcffd1b6a	LibJS: Fix some small remaining issues with parsing unicode escapes Added a test to ensure the behavior stays the same. We now throw on a direct usage of an escaped keywords with a specific error to make it more clear to the user.	2021-08-24 07:42:37 +01:00
Timothy Flynn	1259dc3623	LibJS: Allow Unicode escape sequences in identifiers For example, "property.br\u{64}wn" should resolve to "property.brown". To support this behavior, this commit changes the Token class to hold both the evaluated identifier name and a view into the original source for the unevaluated name. There are some contexts in which identifiers are not allowed to contain Unicode escape sequences; for example, export statements of the form "export {} from foo.js" forbid escapes in the identifier "from". The test file is added to .prettierignore because prettier will replace all escaped Unicode sequences with their unescaped value.	2021-08-19 23:49:25 +02:00
davidot	47bc72bcf6	LibJS: Correctly handle Unicode characters in JS source text Also recognize additional white space characters.	2021-08-16 23:20:04 +01:00
davidot	106f9e30d7	LibJS: Force the lexer to parse a regex when expecting a statement	2021-08-16 23:20:04 +01:00
davidot	4cc95ae39d	LibJS: Fix that a windows-style new line was not escaped properly	2021-08-16 23:20:04 +01:00
davidot	7613c22b06	LibJS: Add a mode to parse JS as a module In a module strict mode should be enabled at the start of parsing and we allow import and export statements.	2021-08-15 23:51:47 +01:00
Ali Mohammad Pur	1a9518ebe3	LibJS: Implement parsing and evaluation for AssignmentPatterns e.g. `[...foo] = bar` can now be evaluated :^)	2021-07-11 21:41:54 +01:00
Ali Mohammad Pur	0292ad33eb	LibJS: Make a slash after a curly close mean not-division There's no grammar rule that allows this.	2021-07-02 14:59:03 +02:00
Andreas Kling	49018553d3	LibJS+LibCrypto: Allow '_' as a numeric literal separator :^) This patch adds support for the NumericLiteralSeparator concept from the ECMAScript grammar.	2021-06-26 16:30:35 +02:00
Linus Groh	714a96619f	LibJS: Disallow whitespace or comments between regex literal and flags If we consumed whitespace and/or comments after a RegexLiteral token, the following token must not be RegexFlags - no whitespace or comments are allowed between the closing / and the flag characters. Fixes #8201.	2021-06-22 14:08:40 +01:00
Linus Groh	597cf88c08	LibJS: Implement the 'Hashbang Grammar for JS' proposal Stage 3 since August 2019 - we already have shebang stripping implemented in js(1), so this removes it from there in favor of adding support to the lexer directly. Most straightforward proposal and implementation I've ever seen :^) https://github.com/tc39/proposal-hashbang	2021-06-18 20:35:23 +01:00
Idan Horowitz	690eb3bb8a	LibJS: Add support for hex, octal & binary big integer literals	2021-06-14 01:45:04 +01:00
Andreas Kling	39ad705c13	LibJS: Use the new is_ascii_foo() helpers from AK These constexpr helpers generate nicer code than the LibC ctype.h variants, so let's make use of them. :^)	2021-06-13 19:11:29 +02:00
Gunnar Beutner	d476144565	Userland: Allow building SerenityOS with -funsigned-char Some of the code assumed that chars were always signed while that is not the case on ARM hosts. Also, some of the code tried to use EOF (-1) in a way similar to what fgetc() does, however instead of storing the characters in an int variable a char was used. While this seemed to work it also meant that character 0xFF would be incorrectly seen as an end-of-file. Careful reading of fgetc() reveals that fgetc() stores character data in an int where valid characters are in the range of 0-255 and the EOF value is explicitly outside of that range (usually -1).	2021-06-13 18:52:58 +02:00
Stephan Unverwerth	10ceeb092f	Everywhere: Use s.unverwerth@serenityos.org :^)	2021-05-29 12:30:08 +01:00
Linus Groh	ebdeed087c	Everywhere: Use linusg@serenityos.org for my copyright headers	2021-04-22 22:51:19 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Linus Groh	a178255a8b	LibJS: Use 'if constexpr' / dbgln_if() instead of '#if LEXER_DEBUG'	2021-04-18 18:14:50 +02:00
Jean-Baptiste Boric	0039ecb189	LibJS: Keep track of file names, lines and columns inside the AST	2021-03-01 11:14:36 +01:00
Andreas Kling	635a5eec75	LibJS: Remove a whole bunch of unnecessary #includes	2021-02-10 09:13:29 +01:00
asynts	eea72b9b5c	Everywhere: Hook up remaining debug macros to Debug.h.	2021-01-25 09:47:36 +01:00
asynts	acdcf59a33	Everywhere: Remove unnecessary debug comments. It would be tempting to uncomment these statements, but that won't work with the new changes. This was done with the following commands: find . $ -name '.cpp' -o -name '.h' -o -name '.in' $ -not -path './Toolchain/' -not -path './Build/' -exec awk -i inplace '$0 !~ /\/\/#define/ { if (!toggle) { print; } else { toggle = !toggle } } ; $0 ~/\/\/#define/ { toggle = 1 }' {} \; find . $ -name '.cpp' -o -name '.h' -o -name '.in' $ -not -path './Toolchain/' -not -path './Build/' -exec awk -i inplace '$0 !~ /\/\/ #define/ { if (!toggle) { print; } else { toggle = !toggle } } ; $0 ~/\/\/ #define/ { toggle = 1 }' {} \;	2021-01-25 09:47:36 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

36 commits