0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	1edb96376b	AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible These could fail to allocate the underlying storage needed to store the UTF-16 data. Propagate these errors.	2023-01-08 12:13:15 +01:00
Timothy Flynn	425c168ded	AK+LibJS+LibRegex: Define an alias for UTF-16 string data storage Instead of writing out "Vector<u16, 1>" everywhere, let's have a name for it.	2023-01-08 12:13:15 +01:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Ali Mohammad Pur	f1851346d3	LibRegex: Use a copy-on-write vector for fork state	2022-11-17 20:13:04 +03:30
Daniel Bertalan	4296425bd8	Everywhere: Remove redundant inequality comparison operators C++20 can automatically synthesize `operator!=` from `operator==`, so there is no point in writing such functions by hand if all they do is call through to `operator==`. This fixes a compile error with compilers that implement P2468 (Clang 16 currently). This paper restores the C++17 behavior that if both `T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators makes the rewriting possible again. See https://reviews.llvm.org/D134529#3853062	2022-11-06 10:25:08 -07:00
Gunnar Beutner	a650c74b27	AK+Toolchain: Make char and wchar_t behave on AARCH64 By default char and wchar_t are unsigned on AARCH64. This fixes a bunch of related compiler errors.	2022-10-14 13:01:13 +02:00
sin-ack	5422691f07	LibRegex: Remove RegexStringView(char const) constructor This allowed passing in a nullptr for the StringView which will not be possible once StringView(char const) is removed.	2022-07-12 23:11:35 +02:00
Ali Mohammad Pur	aa20210119	LibRegex: Don't return empty vectors from RegexStringView::lines() Instead, return a vector of one empty string.	2022-01-26 00:53:09 +03:30
Ali Mohammad Pur	9de33629da	AK+Everywhere: Make Variant::visit() respect the Variant's constness ...and fix all the instances of visit() taking non-const arguments.	2022-01-14 11:35:40 +03:30
Ali Mohammad Pur	1a35e27490	LibRegex: Make FailForks fail all forks up to the last save point This makes negative lookarounds with more than one fork behave correctly. Fixes #11350.	2021-12-25 18:41:10 +01:00
Hendiadyoin1	0f4a79a24d	LibRegex: Capture `this` explicitly in RegexStringView::equals lambda This stops clang-tidy from suggesting that this function can be made static, although accessing `this->operator==` in the lambda function.	2021-12-21 18:17:28 -08:00
Hendiadyoin1	5885e70df7	LibRegex: Remove some meaningless/useless const-qualifiers Also replace String creation from `""` with `String::empty()`	2021-12-21 18:17:28 -08:00
Hendiadyoin1	a2563496f5	LibRegex: Remove some else-after-returns	2021-12-21 18:17:28 -08:00
Andreas Kling	216e21a1fa	AK: Convert AK::Format formatting helpers to returning ErrorOr<void> This isn't a complete conversion to ErrorOr<void>, but a good chunk. The end goal here is to propagate buffer allocation failures to the caller, and allow the use of TRY() with formatting functions.	2021-11-17 00:21:13 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Andreas Kling	024367d82e	LibJS+AK: Use Vector<u16, 1> for UTF-16 string storage It's very common to encounter single-character strings in JavaScript on the web. We can make such strings significantly lighter by having a 1-character inline capacity on the Vectors.	2021-10-02 17:39:38 +02:00
Ali Mohammad Pur	246ab432ff	LibRegex: Add a basic optimization pass This currently tries to convert forking loops to atomic groups, and unify the left side of alternations.	2021-09-13 14:38:53 +04:30
Ali Mohammad Pur	88d148b46a	LibRegex: Avoid keeping track of checkpoints across forks Doing so would increase memory consumption by quite a bit, since many useless copies of the checkpoints hashmap would be created and later thrown away.	2021-09-06 18:21:13 +04:30
Ali Mohammad Pur	abbe9da255	LibRegex: Make infinite repetitions short-circuit on empty matches This makes (addmittedly weird) patterns like `(a)` work correctly without going into an infinite fork loop.	2021-09-06 13:51:30 +04:30
Idan Horowitz	e8f6840471	AK+LibRegex: Disable construction of views from temporary Strings	2021-09-04 21:01:15 +02:00
Timothy Flynn	c5b5c779ff	LibRegex+LibJS: Change capture group names from a String to a FlyString The parser now stores this as a FlyString everywhere, so consumers can also use it as a FlyString.	2021-08-19 23:49:25 +02:00
Timothy Flynn	325eabc770	LibRegex: Ensure the GoBack operation decrements the code unit index This was missed in commit `27d555bab0`.	2021-08-18 09:47:09 +04:30
Timothy Flynn	9509433e25	LibRegex: Implement and use a REPEAT operation for bytecode repetition Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.	2021-08-15 11:43:45 +01:00
Timothy Flynn	a0b72f5ad3	LibRegex: Remove (mostly) unused regex::MatchOutput This struct holds a counter for the number of executed operations, and vectors for matches, captures groups, and named capture groups. Each of the vectors is unused. Remove the struct and just keep a separate counter for the executed operations.	2021-08-15 11:43:45 +01:00
Timothy Flynn	f1ce998d73	LibRegex+LibJS: Combine named and unnamed capture groups in MatchState Combining these into one list helps reduce the size of MatchState, and as a result, reduces the amount of memory consumed during execution of very large regex matches. Doing this also allows us to remove a few regex byte code instructions: ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference. Named groups now behave the same as unnamed groups for these operations. Note that SaveRightNamedCaptureGroup still exists to cache the matched group name. This also removes the recursion level from the MatchState, as it can exist as a local variable in Matcher::execute instead.	2021-08-15 11:43:45 +01:00
Timothy Flynn	27d555bab0	LibRegex: Track string position in both code units and code points In non-Unicode mode, the existing MatchState::string_position is tracked in code units; in Unicode mode, it is tracked in code points. In order for some RegexStringView operations to be performant, it is useful for the MatchState to have a field to always track the position in code units. This will allow RegexStringView methods (e.g. operator[]) to perform lookups based on code unit offsets, rather than needing to iterate over the entire string to find a code point offset.	2021-08-04 11:18:24 +02:00
Timothy Flynn	510bbcd8e0	AK+LibRegex: Add Utf16View::code_point_at and use it in RegexStringView The current method of iterating through the string to access a code point hurts performance quite badly for very large strings. The test262 test "RegExp/property-escapes/generated/Any.js" previously took 3 hours to complete; this one change brings it down to under 10 seconds.	2021-08-04 11:18:24 +02:00
Ali Mohammad Pur	5f342e4fa9	LibRegex: Make Fork{Jump,Stay} non-recursive This makes very fork-heavy expressions (like `(aa)*`) not run out of stack space when matching very long strings.	2021-08-02 17:22:50 +04:30
Ali Mohammad Pur	1dd1378159	LibRegex: Preserve the type of the match when clearing capture groups Even though the contents are supposed to be reset, the type should stay unchanged, as that's an assumption the engine is making.	2021-07-24 20:52:43 +04:30
Timothy Flynn	0e6375558d	AK+LibRegex: Partially implement case insensitive UTF-16 comparison This will work for ASCII code points. Unicode case folding will be needed for non-ASCII.	2021-07-23 23:06:57 +01:00
Timothy Flynn	47f6bb38a1	LibRegex: Support UTF-16 RegexStringView and improve Unicode matching When the Unicode option is not set, regular expressions should match based on code units; when it is set, they should match based on code points. To do so, the regex parser must combine surrogate pairs when the Unicode option is set. Further, RegexStringView needs to know if the flag is set in order to return code point vs. code unit based string lengths and substrings.	2021-07-23 23:06:57 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	f364fcec5d	LibRegex+Everywhere: Make LibRegex more unicode-aware This commit makes LibRegex (mostly) capable of operating on any of the three main string views: - StringView for raw strings - Utf8View for utf-8 encoded strings - Utf32View for raw unicode strings As a result, regexps with unicode strings should be able to properly handle utf-8 and not stop in the middle of a code point. A future commit will update LibJS to use the correct type of string depending on the flags.	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	2961982277	LibRegex: Use <...> includes in RegexMatch.h	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	da1fda73a7	LibRegex: Implement line splitting for Utf32View Co-authored-by: Timothy Flynn <trflynn89@pm.me>	2021-07-18 21:10:55 +04:30
sin-ack	74d76528d6	LibRegex: Display correct position for Compare in REGEX_DEBUG When REGEX_DEBUG is enabled, LibRegex dumps a table of information regarding the state of the regex bytecode execution. The Compare opcode manipulates state.string_position directly, so the string_position value cannot be used to display where the comparison started; therefore, this patch introduces a new variable to keep track of where we were before the comparison happened.	2021-06-16 16:30:12 +04:30
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
AnotherTest	f05e518cbc	LibRegex: Implement section B.1.4. of the ECMA262 spec This allows the parser to deal with crazy patterns like the one in #5517.	2021-02-27 07:31:01 +01:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00
asynts	5c5665c1e7	Everywhere: Replace a bundle of dbg with dbgln. These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.	2021-01-22 22:14:30 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

42 commits