This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
- a
| - b
| | - c
| | | - <accept>
| | - <accept>
| - c
| | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.
This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
This switches to using a simple string equality check if the regex
pattern is strictly a string literal.
Technically this optimisation can also be made on bounded literal
patterns like /[abc]def/ or /abc|def/ as well, but those are
significantly more complex to implement due to our bytecode-only
approach.
If a block jumps before performing a compare, we'd need to recursively
find the first of the jumped-to block. While this is doable, it's not
really worth spending the time as most such cases won't actually qualify
for atomic loop rewrite anyway.
Fixes an invalid rewrite when `.+` is followed by an alternation, e.g.
/.+(a|b|c)/.
Previously we were only checking for overlap when the range wasn't in
inverse mode, which made us miss things like /[^x]x/; this patch makes
it so we don't miss that.
We had a really naive and simplistic implementation, which lead to
various issues where the optimiser incorrectly rewrote the regex to use
atomic groups; this commit fixes that.
Previously we were jumping to the new end of the previous block (created
by the newly inserted ForkStay), correct the offset to jump to the
correct block as shown in the comments.
Fixes#12033.
...by flattening the underlying bytecode chunks first.
Also avoid calling DisjointChunks::size() inside a loop.
This is a very significant improvement in performance, making the
compilation of a large regex with lots of alternatives take only ~100ms
instead of many minutes (I ran out of patience waiting for it) :^)
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes#11247.
Also adds the minimal test case(s) from that issue.
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes#10952.
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
Previously we would've copied the bytecode instead of moving the chunks
around, use the fancy new DisjointChunks<T> abstraction to make that
happen automagically.
This decreases vector copies and uses of memmove() by nearly 10x :^)