ECMA262 requires that the capture groups only contain the values from
the last iteration, e.g. `((c)(a)?(b))` should _not_ contain 'a' in the
second capture group when matching "cabcb".
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings
As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
If the sticky flag is set, the regex execution loop should break
immediately even if the execution was a failure. The specification for
several RegExp.prototype methods (e.g. exec and @@split) rely on this
behavior.
When REGEX_DEBUG is enabled, LibRegex dumps a table of information
regarding the state of the regex bytecode execution. The Compare opcode
manipulates state.string_position directly, so the string_position value
cannot be used to display where the comparison started; therefore, this
patch introduces a new variable to keep track of where we were before
the comparison happened.
Previously this would return a pointer which could be null if the
requested opcode was invalid. This should never be the case though
so let's VERIFY() that instead.
Some of the code assumed that chars were always signed while that is
not the case on ARM hosts.
Also, some of the code tried to use EOF (-1) in a way similar to what
fgetc() does, however instead of storing the characters in an int
variable a char was used.
While this seemed to work it also meant that character 0xFF would be
incorrectly seen as an end-of-file.
Careful reading of fgetc() reveals that fgetc() stores character
data in an int where valid characters are in the range of 0-255 and
the EOF value is explicitly outside of that range (usually -1).
I have no idea *why*, but this stopped working suddenly:
return { { .code_point = '-', .is_character_class = false } };
Fails with:
error: could not convert ‘{{'-', false}}’ from
‘<brace-enclosed initializer list>’ to
‘AK::Optional<regex::CharClassRangeElement>
Might be related to 66f15c2 somehow, going one past that commit makes
the build work again, however reverting the commit doesn't. Not sure
what's up with that.
Consider this patch a band-aid until we can find the reason and an
actual fix...
Compiler version:
gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3)
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.
As many macros as possible are moved to Macros.h, while the
macros to create a test case are moved to TestCase.h. TestCase is now
the only user-facing header for creating a test case. TestSuite and its
helpers have moved into a .cpp file. Instead of requiring a TEST_MAIN
macro to be instantiated into the test file, a TestMain.cpp file is
provided instead that will be linked against each test. This has the
side effect that, if we wanted to have test cases split across multiple
files, it's as simple as adding them all to the same executable.
The test main should be portable to kernel mode as well, so if
there's a set of tests that should be run in self-test mode in kernel
space, we can accomodate that.
A new serenity_test CMake function streamlines adding a new test with
arguments for the test source file, subdirectory under /usr/Tests to
install the test application and an optional list of libraries to link
against the test application. To accomodate future test where the
provided TestMain.cpp is not suitable (e.g. test-js), a CUSTOM_MAIN
parameter can be passed to the function to not link against the
boilerplate main function.