0ct0pu5/ladybird

Author	SHA1	Message	Date
Ali Mohammad Pur	4d71f4edc4	LibRegex: Don't add the Repeat instruction size to its jump target This was causing the calculated jump target to become invalid, leading to possibly invalid optimisations and (more likely) crashes. Fixes #21047.	2023-09-15 18:07:23 +03:30
Ali Mohammad Pur	4d27257c45	LibRegex: Treat backwards jumps to IP 0 as normal backwards jumps too This shows up in something like /\d+\|x/, where the `+` ends up jumping to the start of its own alternative.	2023-08-16 22:20:24 +03:30
Ali Mohammad Pur	e689422564	LibRegex: Keep track of instruction positions for backwards tree jumps	2023-08-05 16:40:04 +02:00
Ali Mohammad Pur	4e69eb89e8	LibRegex: Generate a search tree when patterns would benefit from it This takes the previous alternation optimisation and applies it to all the alternation blocks instead of just the few instructions at the start. By generating a trie of instructions, all logically equivalent instructions will be consolidated into a single node, allowing the engine to avoid checking the same thing multiple times. For instance, given the pattern /abc\|ac\|ab/, this optimisation would generate the following tree: - a \| - b \| \| - c \| \| \| - <accept> \| \| - <accept> \| - c \| \| - <accept> which will attempt to match 'a' or 'b' only once, and would also limit the number of backtrackings performed in case alternatives fails to match. This optimisation is currently gated behind a simple cost model that estimates the number of instructions generated, which is pessimistic for small patterns, though the change in performance in such patterns is not particularly large.	2023-07-31 05:31:33 +02:00
Ali Mohammad Pur	18f4b6c670	LibRegex: Add the literal string search optimisation This switches to using a simple string equality check if the regex pattern is strictly a string literal. Technically this optimisation can also be made on bounded literal patterns like /[abc]def/ or /abc\|def/ as well, but those are significantly more complex to implement due to our bytecode-only approach.	2023-07-31 05:31:33 +02:00
Ali Mohammad Pur	06573cd46d	LibRegex: Enable the atomic rewrite optimisation for unicode properties	2023-07-14 08:59:19 +02:00
Ali Mohammad Pur	fb262de7cb	LibRegex: Make append_alternation() actually skip the common blocks Previously we started with 'left_skip' set to zero, which made it so that no blocks were selected to be skipped.	2023-06-14 06:41:17 +02:00
Ali Mohammad Pur	b1ca2e5e39	LibRegex: Do not treat repeats followed by fallthroughs as atomic	2023-06-14 06:41:17 +02:00
Ali Mohammad Pur	7f530c0753	LibRegex: Bail out of atomic rewrite if a block doesn't contain compares If a block jumps before performing a compare, we'd need to recursively find the first of the jumped-to block. While this is doable, it's not really worth spending the time as most such cases won't actually qualify for atomic loop rewrite anyway. Fixes an invalid rewrite when `.+` is followed by an alternation, e.g. /.+(a\|b\|c)/.	2023-02-15 10:14:26 +01:00
Ali Mohammad Pur	af441bb939	LibRegex: Consider the inverse=true case when finding pattern overlap Previously we were only checking for overlap when the range wasn't in inverse mode, which made us miss things like /[^x]x/; this patch makes it so we don't miss that.	2023-02-15 10:14:26 +01:00
Ben Wiederhake	8a331d4fa0	Everywhere: Move AK/Debug.h include to using files or remove	2023-01-02 20:27:20 -05:00
Ali Mohammad Pur	598dc74a76	LibRegex: Partially implement the ECMAScript unicodeSets proposal This skips the new string unicode properties additions, along with \q{}.	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	fe46b2c141	LibRegex: Correctly track current inversion state in the optimizer This is currently not important as we do not nest TemporaryInverse.	2022-07-10 14:26:03 +02:00
Ali Mohammad Pur	9c5febe800	LibRegex: Flush compare tables before entering a permanent inverse state	2022-07-10 14:26:03 +02:00
Ali Mohammad Pur	7d01ee63d6	LibRegex: Use proper CharRange constructor instead of bit_casting Otherwise the range order would be inverted.	2022-07-05 07:19:13 +02:00
Ali Mohammad Pur	6e655b7f89	LibRegex: Fully interpret the Compare Op when looking for overlaps We had a really naive and simplistic implementation, which lead to various issues where the optimiser incorrectly rewrote the regex to use atomic groups; this commit fixes that.	2022-07-04 23:09:53 +02:00
Ali Mohammad Pur	97a333608e	LibRegex: Make codegen+optimisation for alternatives much faster Just a little thinking outside the box, and we can now parse and optimise a million copies of "a\|" chained together in just a second :^)	2022-02-20 11:53:59 +01:00
Ali Mohammad Pur	3b0943d24c	LibRegex: Correct the alternative matching order when one is empty Previously we were compiling `/a\|/` into what effectively would be `/\|a`, which is clearly incorrect.	2022-02-14 11:30:50 +01:00
Ali Mohammad Pur	6a4c8a66ae	LibRegex: Only skip full instructions when optimizing alternations It makes no sense to skip half of an instruction, so make sure to skip only full instructions!	2022-02-09 21:02:24 +00:00
Ali Mohammad Pur	cd83325c7c	LibRegex: Preserve capture groups and matches across ForkReplace This makes the (flawed) ForkStay inserted as a loop header unnecessary, and finally fixes LibRegex rewriting weird loops in weird ways.	2022-01-22 00:35:49 +00:00
Ali Mohammad Pur	bfe8f312f3	LibRegex: Correct jump offset to the start of the loop block Previously we were jumping to the new end of the previous block (created by the newly inserted ForkStay), correct the offset to jump to the correct block as shown in the comments. Fixes #12033.	2022-01-21 18:14:08 +03:30
Hendiadyoin1	b674de6957	LibRegex: Add some implied auto qualifiers	2021-12-21 18:17:28 -08:00
Ali Mohammad Pur	b8f03bb072	LibRegex: Make append_alternation() significantly faster ...by flattening the underlying bytecode chunks first. Also avoid calling DisjointChunks::size() inside a loop. This is a very significant improvement in performance, making the compilation of a large regex with lots of alternatives take only ~100ms instead of many minutes (I ran out of patience waiting for it) :^)	2021-12-21 22:10:07 +01:00
Ali Mohammad Pur	d2e51fafa9	LibRegex: Merge alternations based on blocks and not instructions The instructions can have dependencies (e.g. Repeat), so only unify equal blocks instead of consecutive instructions. Fixes #11247. Also adds the minimal test case(s) from that issue.	2021-12-15 19:36:45 +03:30
Ali Mohammad Pur	387df06385	LibRegex: Avoid rewriting `a+` as `a` as part of atomic rewriting The initial `ForkStay` is only needed if the looping block has a following block, if there's no following block or the following block does not attempt to match anything, we should not insert the ForkStay, otherwise we would be rewriting `a+` as `a` by allowing the 'end' to be executed. Fixes #10952.	2021-11-18 09:09:22 +01:00
Ali Mohammad Pur	ac856cb965	LibRegex: Don't ignore empty alternatives in append_alternation() Doing so would cause patterns like `(a\|)` to not match the empty string.	2021-10-29 15:57:59 +02:00
Ali Mohammad Pur	8f722302d9	LibRegex: Use a match table for character classes Generate a sorted, compressed series of ranges in a match table for character classes, and use a binary search to find the matches. This is about a 3-4x speedup for character class match performance. :^)	2021-10-03 19:16:36 +02:00
Andreas Kling	2758d99bbc	LibRegex: Flatten bytecode before performing optimizations This avoids doing DisjointChunks traversal for every bytecode access, significantly reducing startup time for large regular expressions.	2021-09-29 18:45:26 +02:00
Ali Mohammad Pur	741886a4c4	LibRegex: Make the optimizer understand references and capture groups Otherwise the fork in patterns like `(1+)\1` would be (incorrectly) optimized away.	2021-09-15 15:52:28 +04:30
Ali Mohammad Pur	bf0315ff8f	LibRegex: Avoid excessive Vector copy when compiling regexps Previously we would've copied the bytecode instead of moving the chunks around, use the fancy new DisjointChunks<T> abstraction to make that happen automagically. This decreases vector copies and uses of memmove() by nearly 10x :^)	2021-09-14 21:33:15 +04:30
Ali Mohammad Pur	246ab432ff	LibRegex: Add a basic optimization pass This currently tries to convert forking loops to atomic groups, and unify the left side of alternations.	2021-09-13 14:38:53 +04:30

31 commits