This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.
Without KVM this test is a clear outlier in runtime during CI:
```
START LibC/TestQsort (106/172)
PASS LibC/TestQsort (41.233692s)
```
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:
auto value = hash_map.get(key).value_or(fallback_value);
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").
To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.
Some i64 values will not fit in normal doubles, and these values _are_
tested by the test suite, this makes the test runtime capable of
handling them correctly.
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
ECMA-402 requires validating user input against the EBNF grammar for
Unicode locales described in TR-35: https://www.unicode.org/reports/tr35
This commit adds validators for that grammar, as well as other helper to
e.g. canonicalize a locale string.
Since there are no real users of these functions in Serenity's
userland and this is my third attempt at this... This time, the great
LibTest test suite will make sure that I do it right!
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.
Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
This patch provides very basic, bare bones implementations of the
INSERT and SELECT statements. They are *very* limited:
- The only variant of the INSERT statement that currently works is
SELECT INTO schema.table (column1, column2, ....) VALUES
(value11, value21, ...), (value12, value22, ...), ...
where the values are literals.
- The SELECT statement is even more limited, and is only provided to
allow verification of the INSERT statement. The only form implemented
is: SELECT * FROM schema.table
These statements required a bit of change in the Statement::execute
API. Originally execute only received a Database object as parameter.
This is not enough; we now pass an ExecutionContext object which
contains the Database, the current result set, and the last Tuple read
from the database. This object will undoubtedly evolve over time.
This API change dragged SQLServer::SQLStatement into the patch.
Another API addition is Expression::evaluate. This method is,
unsurprisingly, used to evaluate expressions, like the values in the
INSERT statement.
Finally, a new test file is added: TestSqlStatementExecution, which
tests the currently implemented statements. As the number and flavour of
implemented statements grows, this test file will probably have to be
restructured.
The implemtation of the Value class was based on lambda member variables
implementing type-dependent behaviour. This was done to ensure that
Values can be used as stack-only objects; the simplest alternative,
virtual methods, forces them onto the heap. The problem with the the
lambda approach is that it bloats the Values (which are supposed to be
lightweight objects) quite considerably, because every object contains
more than a dozen function pointers.
The solution to address both problems (we want Values to be able to live
on the stack and be as lightweight as possible) chosen here is to
encapsulate type-dependent behaviour and state in an implementation
class, and let the Value be an AK::Variant of those implementation
classes. All methods of Value are now basically straight delegates to
the implementation object using the Variant::visit method.
One issue complicating matters is the addition of two aggregate types,
Tuple and Array, which each contain a Vector of Values. At this point
Tuples and Arrays (and potential future aggregate types) can't contain
these aggregate types. This is limiting and needs to be addressed.
Another area that needs attention is the nomenclature of things; it's
a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense
right now I think but admit we probably can do better.
Other things included here:
- Added the Boolean and Null types (and Tuple and Array, see above).
- to_string now always succeeds and returns a String instead of an
Optional. This had some impact on other sources.
- Added a lot of tests.
- Started moving the serialization mechanism more towards where I want
it to be, i.e. a 'DataSerializer' object which just takes
serialization and deserialization requests and knows for example how
to store long strings out-of-line.
One last remark: There is obviously a naming clash between the Tuple
class and the Tuple Value type. This is intentional; I plan to make the
Tuple class a subclass of Value (and hence Key and Row as well).
For example, consider the following pattern:
new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.