0ct0pu5/ladybird

Author	SHA1	Message	Date
Andreas Kling	34344120f2	AK: Make "foo"_string infallible Stop worrying about tiny OOMs. Work towards #20405.	2023-08-07 16:03:27 +02:00
Simon Wanner	a2efecac03	LibJS: Parse slashes after reserved identifiers correctly Previously we were unable to parse code like `yield/2` because `/2` was parsed as a regex. At the same time `for (a in / b/)` was parsed as a division. This is solved by defaulting to division in the lexer, but calling `force_slash_as_regex()` from the parser whenever an IdentifierName is parsed as a ReservedWord.	2023-06-10 07:20:33 +02:00
Linus Groh	09d40bfbb2	Everywhere: Use _{short_,}string to create Strings from literals	2023-02-25 20:51:49 +01:00
Evan Smal	3226ce3d83	LibJS: Remove some usage of DeprecatedString usage from Lexer This changes the filename member from DeprecatedString to String. Parser has also been updated to meet the updated Lexer interface.	2023-01-26 20:25:25 +00:00
Evan Smal	cfa6b4d815	LibJS: Remove DeprecatedString usage from Token	2023-01-26 20:25:25 +00:00
Timothy Flynn	f3db548a3d	AK+Everywhere: Rename FlyString to DeprecatedFlyString DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.	2023-01-09 23:00:24 +00:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
davidot	b3edd94869	LibJS: Treat '\\' as an escaped character in template literals Before this change we would ignore that the second backslash is escaped and template strings ending with ` \\` would be unterminated as the second slash was used to escape the closing quote.	2022-11-15 12:00:36 +00:00
sin-ack	fbc771efe9	Everywhere: Use default StringView constructor over nullptr While null StringViews are just as bad, these prevent the removal of StringView(char const*) as that constructor accepts a nullptr. No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Andreas Kling	92e0378dbd	LibJS: Always inline Lexer::current_code_point() This gives a ~1% speedup when parsing the largest Discord JS file.	2022-02-13 14:44:36 +01:00
Andreas Kling	3d9f2e2f94	LibJS: Add ASCII fast path to Lexer::current_code_point() If the current character under the lexer cursor is ASCII, we don't need to create a Utf8View to consume a full code point. This gives a ~3% speedup when parsing the largest Discord JS file.	2022-02-13 14:44:36 +01:00
Linus Groh	95a9f12b97	LibJS: Set Token's m_offset to the value's start index This makes much more sense than the current way of setting it to the Lexer's m_position after consuming the full value.	2022-01-19 20:33:08 +00:00
davidot	56c425eec1	LibJS: Detect invalid unicode and stop lexing at that point Previously we might swallow invalid unicode point which would skip valid ascii characters. This could be dangerous as we might skip a '"' thus not closing a string where we should. This might have been exploitable as it would not have been clear what code gets executed when looking at a script. Another approach to this would be simply replacing all invalid characters with the replacement character (this is what v8 does). But our lexer and parser are currently not set up for such a change.	2021-12-29 16:57:23 +01:00
davidot	a1308bfc60	LibJS: Make new lines in block comments reset line has token Before this a closing html comment would not be treated as a comment if directly following a block comment which was not the first token of its first line.	2021-12-21 14:04:23 +01:00
davidot	e751dcea43	LibJS: Treat private identifier as divisible token And also make sure private identifiers are correctly checked when synthesizing a binding pattern.	2021-11-30 17:05:32 +00:00
davidot	afde1821b5	LibJS: Disallow numerical separators in octal numbers and after '.'	2021-11-30 17:05:32 +00:00
Idan Horowitz	681787de76	LibJS: Add support for async functions This commit adds support for the most bare bones version of async functions, support for async generator functions, async arrow functions and await expressions are TODO.	2021-11-10 08:48:27 +00:00
davidot	eeb42c21d1	LibJS: Lex private identifiers, identifiers prefixed with a '#'	2021-10-20 23:19:17 +01:00
Nico Weber	b8dc3661ac	Libraries: Fix -Wunreachable-code warnings from clang	2021-10-08 23:33:46 +02:00
davidot	ac2c3a73b1	LibJS: Add a specific test for invalid unicode characters in the lexer Also fixes that it tried to make substrings past the end of the source if we overran the source length.	2021-10-03 17:42:05 +02:00
Luke Wilde	ae0bdda86e	LibJS: Remove read buffer overflow in Lexer::consume The position is added to manually in the line terminator and Unicode character cases. While it checks for EOF after doing so, the EOF check used `!=` instead of `<`, meaning if the position went _over_ the source length, it wouldn't think it was EOF and would cause read buffer overflows. For example, `0xea` followed by `0xfd` would cause this.	2021-10-02 17:16:09 +02:00
Andreas Kling	76bafe5542	LibJS: Always inline two hot (and trivial) functions in JS::Lexer This improves parsing time on a large chunk of JS by ~5%.	2021-09-18 19:54:24 +02:00
Andreas Kling	8bde4e94d8	LibJS: Make Lexer::s_keywords store keywords as FlyString This allows O(1) comparison against lexed keywords, since we lex to FlyString.	2021-09-18 19:54:24 +02:00
Andreas Kling	bf46845819	LibJS: Avoid a temporary AK::String when lexing already-seen identifiers By using the FlyString(StringView) constructor instead of the FlyString(String) one, we can dodge a temporary String construction. This improves parsing time on a large chunk of JS by ~1.6%.	2021-09-18 19:54:24 +02:00
Linus Groh	a50e33abe3	LibJS: Skip ID_{Start,Continue} property lookup for any ASCII characters Before this change, Lexer::is_identifier_{start,middle}() would do a Unicode property lookup via Unicode::code_point_has_property() quite frequently, especially for common characters like .,;{}[]() etc. Since these and any other ASCII characters not covered by the alpha / alphanumeric check are known to not have the ID_Start / ID_Continue (except '_', which is special-cased now) properties, we can easily avoid this function call.	2021-09-14 02:48:57 +02:00
Andreas Kling	d7578ddebb	LibJS: Share "parsed identifiers" between copied JS::Lexer instances When we save/load state in the parser, we preserve the lexer state by simply making a copy of it. This was made extremely heavy by the lexer keeping a cache of all parsed identifiers. It keeps the cache to ensure that StringViews into parsed Unicode escape sequences don't become dangling views when the Token goes out of scope. This patch solves the problem by replacing the Vector<FlyString> which was used to cache the identifiers with a ref-counted HashTable<FlyString> instead. Since the purpose of the cache is just to keep FlyStrings alive, it's fine for all Lexer instances to share the cache. And as a bonus, using a HashTable instead of a Vector replaces the O(n) accesses with O(1) ones. This makes a 1.9 MiB JavaScript file parse in 0.6s instead of 24s. :^)	2021-09-10 23:18:00 +02:00
davidot	bbddfeef4b	LibJS: Clean up token constructor and use method instead for identifiers Having two large constructor with just one parameter difference in the middle seems quite dangerous so just do it with a method.	2021-09-06 08:43:38 +01:00
Brian Gianforcaro	77d8a65498	LibJS: Fix incorrect Lexer VERIFY when parsing Unicode characters This bug was discovered via OSS fuzz, it's possible to fall through to this assert with a char_size == 1, so we need to account for that in the VERIFY(..). Repro test case can be found in the OSS fuzz bug: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=37296	2021-08-25 09:21:23 +01:00
davidot	c108c8ff24	LibJS: Disallow yield expression correctly in formal parameters And add ZERO WIDTH NO BREAK SPACE to valid whitespace.	2021-08-24 07:42:37 +01:00
davidot	7bcffd1b6a	LibJS: Fix some small remaining issues with parsing unicode escapes Added a test to ensure the behavior stays the same. We now throw on a direct usage of an escaped keywords with a specific error to make it more clear to the user.	2021-08-24 07:42:37 +01:00
Timothy Flynn	1259dc3623	LibJS: Allow Unicode escape sequences in identifiers For example, "property.br\u{64}wn" should resolve to "property.brown". To support this behavior, this commit changes the Token class to hold both the evaluated identifier name and a view into the original source for the unevaluated name. There are some contexts in which identifiers are not allowed to contain Unicode escape sequences; for example, export statements of the form "export {} from foo.js" forbid escapes in the identifier "from". The test file is added to .prettierignore because prettier will replace all escaped Unicode sequences with their unescaped value.	2021-08-19 23:49:25 +02:00
davidot	47bc72bcf6	LibJS: Correctly handle Unicode characters in JS source text Also recognize additional white space characters.	2021-08-16 23:20:04 +01:00
davidot	106f9e30d7	LibJS: Force the lexer to parse a regex when expecting a statement	2021-08-16 23:20:04 +01:00
davidot	4cc95ae39d	LibJS: Fix that a windows-style new line was not escaped properly	2021-08-16 23:20:04 +01:00
davidot	7613c22b06	LibJS: Add a mode to parse JS as a module In a module strict mode should be enabled at the start of parsing and we allow import and export statements.	2021-08-15 23:51:47 +01:00
Ali Mohammad Pur	1a9518ebe3	LibJS: Implement parsing and evaluation for AssignmentPatterns e.g. `[...foo] = bar` can now be evaluated :^)	2021-07-11 21:41:54 +01:00
Ali Mohammad Pur	0292ad33eb	LibJS: Make a slash after a curly close mean not-division There's no grammar rule that allows this.	2021-07-02 14:59:03 +02:00
Andreas Kling	49018553d3	LibJS+LibCrypto: Allow '_' as a numeric literal separator :^) This patch adds support for the NumericLiteralSeparator concept from the ECMAScript grammar.	2021-06-26 16:30:35 +02:00
Linus Groh	714a96619f	LibJS: Disallow whitespace or comments between regex literal and flags If we consumed whitespace and/or comments after a RegexLiteral token, the following token must not be RegexFlags - no whitespace or comments are allowed between the closing / and the flag characters. Fixes #8201.	2021-06-22 14:08:40 +01:00
Linus Groh	597cf88c08	LibJS: Implement the 'Hashbang Grammar for JS' proposal Stage 3 since August 2019 - we already have shebang stripping implemented in js(1), so this removes it from there in favor of adding support to the lexer directly. Most straightforward proposal and implementation I've ever seen :^) https://github.com/tc39/proposal-hashbang	2021-06-18 20:35:23 +01:00
Idan Horowitz	690eb3bb8a	LibJS: Add support for hex, octal & binary big integer literals	2021-06-14 01:45:04 +01:00
Andreas Kling	39ad705c13	LibJS: Use the new is_ascii_foo() helpers from AK These constexpr helpers generate nicer code than the LibC ctype.h variants, so let's make use of them. :^)	2021-06-13 19:11:29 +02:00
Gunnar Beutner	d476144565	Userland: Allow building SerenityOS with -funsigned-char Some of the code assumed that chars were always signed while that is not the case on ARM hosts. Also, some of the code tried to use EOF (-1) in a way similar to what fgetc() does, however instead of storing the characters in an int variable a char was used. While this seemed to work it also meant that character 0xFF would be incorrectly seen as an end-of-file. Careful reading of fgetc() reveals that fgetc() stores character data in an int where valid characters are in the range of 0-255 and the EOF value is explicitly outside of that range (usually -1).	2021-06-13 18:52:58 +02:00
Stephan Unverwerth	10ceeb092f	Everywhere: Use s.unverwerth@serenityos.org :^)	2021-05-29 12:30:08 +01:00
Linus Groh	ebdeed087c	Everywhere: Use linusg@serenityos.org for my copyright headers	2021-04-22 22:51:19 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Linus Groh	a178255a8b	LibJS: Use 'if constexpr' / dbgln_if() instead of '#if LEXER_DEBUG'	2021-04-18 18:14:50 +02:00
Jean-Baptiste Boric	0039ecb189	LibJS: Keep track of file names, lines and columns inside the AST	2021-03-01 11:14:36 +01:00
Andreas Kling	635a5eec75	LibJS: Remove a whole bunch of unnecessary #includes	2021-02-10 09:13:29 +01:00

1 2

53 commits