beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-22 23:50:19 +00:00

Author	SHA1	Message	Date
Ali Mohammad Pur	bf0315ff8f	LibRegex: Avoid excessive Vector copy when compiling regexps Previously we would've copied the bytecode instead of moving the chunks around, use the fancy new DisjointChunks<T> abstraction to make that happen automagically. This decreases vector copies and uses of memmove() by nearly 10x :^)	2021-09-14 21:33:15 +04:30
Ali Mohammad Pur	246ab432ff	LibRegex: Add a basic optimization pass This currently tries to convert forking loops to atomic groups, and unify the left side of alternations.	2021-09-13 14:38:53 +04:30
Ali Mohammad Pur	abbe9da255	LibRegex: Make infinite repetitions short-circuit on empty matches This makes (addmittedly weird) patterns like `(a)` work correctly without going into an infinite fork loop.	2021-09-06 13:51:30 +04:30
Ali Mohammad Pur	dd82c2e9b4	LibRegex: Correctly handle failing in the middle of explicit repeats - Make sure that all the Repeat ops are reset (otherwise the operation would not be correct when going over the Repeat op a second time) - Make sure that all matches that are allowed to fail are backed by a fork, otherwise the last failing fork would not have anywhere to return to. Fixes #9707.	2021-09-01 13:36:53 +02:00
Ali Mohammad Pur	98624fe03f	LibRegex: Implement min/max repetition using the Repeat bytecode This makes repetitions with large max bounds work correctly. Also fixes an OOM issue found by OSS-Fuzz: https://oss-fuzz.com/testcase?key=4725721980338176	2021-08-31 16:37:49 +02:00
Timothy Flynn	9509433e25	LibRegex: Implement and use a REPEAT operation for bytecode repetition Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.	2021-08-15 11:43:45 +01:00
Timothy Flynn	a0b72f5ad3	LibRegex: Remove (mostly) unused regex::MatchOutput This struct holds a counter for the number of executed operations, and vectors for matches, captures groups, and named capture groups. Each of the vectors is unused. Remove the struct and just keep a separate counter for the executed operations.	2021-08-15 11:43:45 +01:00
Timothy Flynn	f1ce998d73	LibRegex+LibJS: Combine named and unnamed capture groups in MatchState Combining these into one list helps reduce the size of MatchState, and as a result, reduces the amount of memory consumed during execution of very large regex matches. Doing this also allows us to remove a few regex byte code instructions: ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference. Named groups now behave the same as unnamed groups for these operations. Note that SaveRightNamedCaptureGroup still exists to cache the matched group name. This also removes the recursion level from the MatchState, as it can exist as a local variable in Matcher::execute instead.	2021-08-15 11:43:45 +01:00
Timothy Flynn	484ccfadc3	LibRegex: Support property escapes of Unicode script extensions	2021-08-04 13:50:32 +01:00
Timothy Flynn	06088df729	LibRegex: Support property escapes of the Unicode script property Note that unlike binary properties and general categories, scripts must be specified in the non-binary (Script=Value) form.	2021-08-04 13:50:32 +01:00
Timothy Flynn	1e10d6d7ce	LibRegex: Support property escapes of Unicode General Categories This changes LibRegex to parse the property escape as a Variant of Unicode Property & General Category values. A byte code instruction is added to perform matching based on General Category values.	2021-08-02 21:02:09 +04:30
Timothy Flynn	d485cf29d7	LibRegex+LibUnicode: Begin implementing Unicode property escapes This supports some binary property matching. It does not support any properties not yet parsed by LibUnicode, nor does it support value matching (such as Script_Extensions=Latin).	2021-07-30 21:26:31 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	c8b2199251	LibRegex: Clear previous capture group contents in ECMA262 mode ECMA262 requires that the capture groups only contain the values from the last iteration, e.g. `((c)(a)?(b))` should _not_ contain 'a' in the second capture group when matching "cabcb".	2021-07-23 21:19:21 +04:30
Gunnar Beutner	36e36507d5	Everywhere: Prefer using {:#x} over 0x{:x} We have a dedicated format specifier which adds the "0x" prefix, so let's use that instead of adding it manually.	2021-07-22 08:57:01 +02:00
Ali Mohammad Pur	f364fcec5d	LibRegex+Everywhere: Make LibRegex more unicode-aware This commit makes LibRegex (mostly) capable of operating on any of the three main string views: - StringView for raw strings - Utf8View for utf-8 encoded strings - Utf32View for raw unicode strings As a result, regexps with unicode strings should be able to properly handle utf-8 and not stop in the middle of a code point. A future commit will update LibJS to use the correct type of string depending on the flags.	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	addfa1e82e	LibRegex: Make the bytecode transformation functions static They were pretty confusing when compared with other non-transforming functions.	2021-07-10 13:33:08 +02:00
Andreas Kling	e59bf87374	Userland: Replace VERIFY(is<T>) with verify_cast<T> Instead of doing a VERIFY(is<T>(x)) and then casting it to T, we can just do the cast right away with verify_cast<T>. :^)	2021-06-24 21:13:09 +02:00
Gunnar Beutner	5bfe601152	LibRegex: Remove unused code	2021-06-14 16:09:58 +04:30
Gunnar Beutner	a167941852	LibRegex: Use a plain pointer for OpCode::m_state	2021-06-14 16:09:58 +04:30
Gunnar Beutner	d3c2a3caea	LibRegex: Avoid initialization checks in get_opcode_by_id()	2021-06-14 16:09:58 +04:30
Gunnar Beutner	281f39073d	LibRegex: Make get_opcode() return a reference Previously this would return a pointer which could be null if the requested opcode was invalid. This should never be the case though so let's VERIFY() that instead.	2021-06-14 16:09:58 +04:30
Gunnar Beutner	cd49fb0229	LibRegex: Remove return value for setters	2021-06-14 16:09:58 +04:30
Gunnar Beutner	1fb4471506	LibRegex: Use a plain array to store opcodes Using a hash map is unnecessary because the number of opcodes and their IDs never change.	2021-06-14 16:09:58 +04:30
Andreas Kling	dc65f54c06	AK: Rename Vector::append(Vector) => Vector::extend(Vector) Let's make it a bit more clear when we're appending the elements from one vector to the end of another vector.	2021-06-12 13:24:45 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Andreas Kling	c68dcf45b6	LibRegex: Convert String::format() => String::formatted()	2021-04-21 23:49:02 +02:00
AnotherTest	e9279d1790	LibRegex: Allow a '?' suffix for brace quantifiers This fixes another compat point in #6042.	2021-04-10 09:16:03 +02:00
AnotherTest	8d7bcc2476	LibRegex: Give ByteCode a copy ctor and and a move assignment operator Previously all move assignments were actually copies. oops.	2021-04-10 09:16:03 +02:00
AnotherTest	0f468a5013	LibRegex: Test alternatives in the expected order That is, first try to match the left side of the alternation, and then the right side. Fixes part of #6042.	2021-04-01 21:55:47 +02:00
AnotherTest	6bbb26fdaf	LibRegex: Allow references to capture groups that aren't parsed yet This only applies to the ECMA262 parser. This behaviour is an ECMA262-specific quirk, such references always generate zero-length matches (even on subsequent passes). Also adds a test in LibJS's test suite. Fixes #6039.	2021-04-01 21:55:47 +02:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00
Linus Groh	421587c15c	Everywhere: Fix typos	2021-01-22 18:41:29 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

34 commits