Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
We weren't properly iterating the extension blocks and thought we
encountered an unexpected extension map block, when we really should
have just skipped over it.
Reverts recent change introduced to support implicit symbolic permission
which broke the parser when multiple classes are specified.
The state machine must assume it's dealing with classes until an
operation character is consumed.
This library can be used (for the most part) by kernel drivers as well
as user mode. For this reason FixedPoint is used rather than floating
point, but kept to a minimum.
Rather than casting the FixedPoint to double, format the FixedPoint
directly. This avoids using floating point instruction, which in
turn enables this to be used even in the kernel.
The implementation of LIKE uses regexes under the hood, and this
implementation of REGEXP takes the same approach. It employs
PosixExtended from LibRegex with case insensitive and Unicode flags
set. The implementation of LIKE is based on SQLlite specs, but SQLlite
does not offer directions for a built-in regex functionality, so this
one uses LibRegex.
The event loop system was previously very singletony to the point that
there's only a single event loop stack per process and only one event
loop (the topmost) can run at a time. This commit simply makes the event
loop stack and related structures thread-local so that each thread has
an isolated event loop system.
Some things are kept at a global level and synchronized with the new
MutexProtected: The main event loop needs to still be obtainable from
anywhere, as it closes down the application when it exits. The ID
allocator is global as IDs should not be shared even between threads.
And for the inspector server connection, the same as for the main loop
holds.
Note that currently, the wake pipe is only created by the main thread,
so notifications don't work on other threads.
This removes the temporary mutex fix for notifiers, introduced in
0631d3fed5 .
This is no longer needed now that LibTimeZone is included within LibC.
Remove the direct linkage so that others do not mistakenly copy-paste
the CMakeLists text elsewhere.
Instead of leaking all capture groups and selectively clearing some,
simply avoid leaking things and only "define" the ones that need to
exist.
This *actually* implements the capture groups ECMA262 quirk.
Also adds the test removed in the previous commit (to avoid messing up
test runs across bisects).
This partially reverts commit c11be92e23.
That commit fixes one thing and breaks many more, a next commit will
implement this quirk in a more sane way.
Previously we were jumping to the new end of the previous block (created
by the newly inserted ForkStay), correct the offset to jump to the
correct block as shown in the comments.
Fixes#12033.
This test makes sure that Socket classes such as TCPSocket properly
return an error when connection fails rather than crashing or creating
an invalid object.
Accidentally regressed this test during the Core::LocalServer refactor,
and didn't catch it since TestLibCoreStream is disabled in the CI right
now. We have to wait for some data to become available, as pending_bytes
will immediately return 0 and a 0-sized read immediately returns.
We went through some trouble to make & and | work right. Reimplement ^
in terms of & and | to make ^ work right as well.
This is less fast than a direct implementation, but let's get things
working first.
Similar to the bitwise_and change, but we have to be careful to
sign-extend two's complement numbers only up to the highest set bit
in the positive number.
Bitwise and is defined in terms of two's complement, so some converting
needs to happen for SignedBigInteger's sign/magnitude representation to
work out.
UnsignedBigInteger::bitwise_not() is repurposed to convert all
high-order zero bits to ones up to a limit, for the two's complement
conversion to work.
Fixes test262/test/language/expressions/bitwise-and/bigint.js.
Bitwise operators are defined on two's complement, but SignedBitInteger
uses sign-magnitude. Correctly convert between the two.
Let LibJS delegate to SignedBitInteger for bitwise_not, like it does
for all other bitwise_ operations on bigints.
No behavior change (LibJS is now the only client of
SignedBitInteger::bitwise_not()).
Ordering is done by replacing the straight Vector holding the query
result in the SQLResult object with a dedicated Vector subclass that
inserts result rows according to their sort key using a binary search.
This is done in the ResultSet class.
There are limitations:
- "SELECT ... ORDER BY 1" (or 2 or 3 etc) is supposed to sort by the
n-th result column. This doesn't work yet
- "SELECT ... column-expression alias ... ORDER BY alias" is supposed to
sort by the column with the given alias. This doesn't work yet
What does work however is something like
```SELECT foo FROM bar SORT BY quux```
i.e. sorted by a column not in the result set. Once functions are
supported it should be possible to sort by random functions.
This change unfortunately cannot be atomically made without a single
commit changing everything.
Most of the important changes are in LibIPC/Connection.cpp,
LibIPC/ServerConnection.cpp and LibCore/LocalServer.cpp.
The notable changes are:
- IPCCompiler now generates the decode and decode_message functions such
that they take a Core::Stream::LocalSocket instead of the socket fd.
- IPC::Decoder now uses the receive_fd method of LocalSocket instead of
doing system calls directly on the fd.
- IPC::ConnectionBase and related classes now use the Stream API
functions.
- IPC::ServerConnection no longer constructs the socket itself; instead,
a convenience macro, IPC_CLIENT_CONNECTION, is used in place of
C_OBJECT and will generate a static try_create factory function for
the ServerConnection subclass. The subclass is now responsible for
passing the socket constructed in this function to its
ServerConnection base; the socket is passed as the first argument to
the constructor (as a NonnullOwnPtr<Core::Stream::LocalServer>) before
any other arguments.
- The functionality regarding taking over sockets from SystemServer has
been moved to LibIPC/SystemServerTakeover.cpp. The Core::LocalSocket
implementation of this functionality hasn't been deleted due to my
intention of removing this class in the near future and to reduce
noise on this (already quite noisy) PR.
This makes the following code behave as expected:
Variant<int, String> x { some_string() };
x.visit(
[](String const&) {}, // Expectation is for this to be called
[](auto&) {});
As per previous discussion, it was decided that the Stream classes
should be constructed on the heap.
While I don't personally agree with this change, it does have the
benefit of avoiding Function object reconstructions due to the lambda
passed to Notifier pointing to a stale object reference. This also has
the benefit of not having to "box" objects for virtual usage, as the
objects come pre-boxed.
However, it means that we now hit the heap everytime we construct a
TCPSocket for instance, which might not be desirable.
Except for tangential accessors such as data(), there is no more feature
of FixedArray that is untested after this large expansion of its test
cases. These tests, with the help of the new NoAllocationGuard, also
test the allocation contract that was fixated in the last commit.
Hopefully this builds confidence in future Kernel uses of FixedArray
as well as its establishment in the real-time parts of the audio
subsystem. I'm excited :^)
FixedArray always *almost* had the following allocation guarantees:
There is (possibly) one allocation in the constructor and one (or more)
deallocation(s) in the destructor. No other operation allocates or
deallocates. With this removal of the public clear() method, which
nobody except the test used anyways, those guarantees are now completely
true and furthermore fixated with an explanatory comment.
Our generator is currently preferring the DST variant of the time zone
display names over the non-DST variant. LibTimeZone currently does not
have DST support, and operates in a mode that basically assumes DST does
not exist. Swap the display names for now just to be consistent until we
have DST support.
Note we will need to generate both of these variants and select the
appropriate one at runtime once we have DST support.
The following table in TR-35 includes a web of fall back rules when the
requested time zone style is unavailable:
https://unicode.org/reports/tr35/tr35-dates.html#dfst-zone
Conveniently, the subset of styles supported by ECMA-402 (and therefore
LibUnicode) all either fall back to GMT offset or to a style that is
unsupported but itself falls back to GMT offset.
This adds an API to use LibTimeZone to convert a time zone such as
"America/New_York" to a GMT offset string like "GMT-5" (short form) or
"GMT-05:00" (long form).
Instead of only having dummy functions that don't work with any input,
let's at least support one time zone: 'UTC'. This matches the basic
Temporal implementation for engines without ECMA-262, for example.
This mechanism was unsafe to use in any multithreaded context, since
the hook function was invoked on a raw pointer *after* decrementing
the local ref count.
Since we don't use it for anything anymore, let's just get rid of it.
This is a rather naive implementation, but serves as a first pass at
determining the GMT offset for a time zone at a particular point in
time. This implementation ignores DST (because we are not parsing any
RULE entries yet), and ignores any offset patterns of the form "Mon>4"
or "lastSun".
Currently, we define a CaseInsensitiveStringTraits structure for String.
Using this structure for StringView involves allocating a String from
that view, and a second string to convert that intermediate string to
lowercase.
This defines CaseInsensitiveStringViewTraits (and the underlying helper
case_insensitive_string_hash) to avoid allocations.
FixedArray now doesn't expose any infallible constructors anymore.
Rather, it exposes fallible methods. Therefore, it can be used for
OOM-safe code.
This commit also converts the rest of the system to use the new API.
However, as an example, VMObject can't take advantage of this yet,
as we would have to endow VMObject with a fallible static
construction method, which would require a very fundamental change
to VMObject's whole inheritance hierarchy.
Add a unit test for each sample pdf file that currently exists in the
anon user's `~/Document/pdf` directory.
- linear.pdf
- non-linearized.pdf
- complex.pdf
Each test ensures that the pdf document is parsed and that the page
count is the expected one.
So far we only had mmap(2) functionality on the /dev/mem device, but now
we can also do read(2) on it.
The test unit was updated to check we are doing it safely.
The previous implementation had some pretty short cycles and two fixed
points (1711463637 and 2389024350). If two keys hashed to one of these
values insertions and lookups would loop forever.
This version is based on a standard xorshift PRNG with period 2**32-1.
The all-zero state is usually forbidden, so we insert it into the cycle
at an arbitrary location.
The evaluation order of method parameters is unspecified in C++, and
so we couldn't rely on parse_statement() being called before
parse_escape() when building a MatchExpression.
With this patch, we explicitly parse what we need in the right order,
before building the MatchExpression object.
The generator parses metaZones.json to form a mapping of meta zones to
time zones (AKA "golden zone" in TR-35). This parser errantly assumed
this was a 1-to-1 mapping.
As it was, negative predicate test for remove_all_matching was
run on empty hash map, and could not remove anything, so test always
returned true. By duplicating it in state where hash maps contains
elements, we make sure that negative predicate has something to
do nothing on.
These were missed in 565a880ce5.
This wasn't an issue because these tests don't pledge/unveil anything,
so they could happily dlopen() the library at runtime. But this is now
needed in order to migrate LibUnicode towards weak symbols instead.
This was currently crashing Half-Life because it was a considered an
"Unknown" specifier. We can use the same case statement as the regular
hex format conversion (lower case 'x'), as the backend
to convert the number already supports upper/lower case input, hence
we get it for free :^)
This exposed a missing exception check in parseWebAssemblyModule(),
which could throw but still return a normal completion (which currently
works as we check VM::exception() at the right point, but breaks when
moving everything to exceptions).
The spec has a note stating that resolve binding will always return a
reference whose [[ReferencedName]] field is name. However this is not
correct as the underlying method GetIdentifierReference may throw on
env.HasBinding(name) thus it can throw. However, there are some
scenarios where it cannot throw because the reference is known to exist
in that case we use MUST with a comment.
Previously we might swallow invalid unicode point which would skip valid
ascii characters. This could be dangerous as we might skip a '"' thus
not closing a string where we should.
This might have been exploitable as it would not have been clear what
code gets executed when looking at a script.
Another approach to this would be simply replacing all invalid
characters with the replacement character (this is what v8 does). But
our lexer and parser are currently not set up for such a change.
It was possible for the "local_socket_read" and "local_socket_write"
tests to fail because we had exited the EventLoop before
BackgroundAction got around to invoking the completion callback.
The crash happened when trying to deferred_invoke() on the background
thread, calling Core::EventLoop::current() after said EventLoop had
returned from exec().
Fix this by not passing a completion callback, since we didn't need
one in the first place.
This is a raffinement of 49cbd4dcca.
Previously, the container was scanned to compute the size in the unhappy
path. Now, using `all_of` happy and unhappy path should be fast.
ISO C requires in section 7.2:
The assert macro is redefined according to the current state of NDEBUG
each time that <assert.h> is included.
Also add tests for `assert` multiple inclusion accordingly.
For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.
This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
This unbreaks the /var/run/utmp system which starts out as an empty
string, and is then turned into an object by the first update.
This isn't necessarily the best way for this to work, but it's how
it used to work, so this just fixes the regression for now.
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes#11247.
Also adds the minimal test case(s) from that issue.
Fixes a crash that was caused by a syntax error which is difficult to
catch by the parser: usually identifiers are accepted in column lists,
but they are not in a list of column values to be inserted in an INSERT.
Fixed this by putting in a heuristic check; we probably need a better
way to do this.
Included tests for this case.
Also introduced a new SQL Error code, `NotYetImplemented`, and return
that instead of crashing when encountering unimplemented SQL.
The handling of filesystem level errors was basically non-existing or
consisting of `VERIFY_NOT_REACHED` assertions. Addressed this by
* Adding `open` methods to `Heap` and `Database` which return errors.
* Changing the interface of methods of these classes and clients
downstream to propagate these errors.
The constructors of `Heap` and `Database` don't open the underlying
filesystem file anymore.
The SQL statement handlers return an `SQLErrorCode::InternalError`
error code if an error comes back from the lower levels. Note that some
of these errors are things like duplicate index entry errors that should
be caught before the SQL layer attempts to actually update the database.
Added tests to catch attempts to open weird or non-existent files as
databases.
Finally, in between me writing this patch and submitting the PR the
AK::Result<Foo, Bar> template got deprecated in favour of ErrorOr<Foo>.
This resulted in more busywork.
For example, consider the following adjacent entries in UnicodeData.txt:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).
The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:
2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;
Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
At the moment we just check if we *can* render a simple triangle, we do
not yet actually test if the image is indeed the triangle we wanted.
This test also outputs the rendered image when GL_DEBUG is enabled to a
file called "picture.bmp" for manual verification.
Co-authored-by: sunverwerth <s.unverwerth@serenityos.org>
Since we no longer populate a Vector<String> the lifetime of the strings
in all of these tests is now messed up, as the Vector<StringView> now
points to free'd memory.
We attempt to fix this for the unit tests, by saving the results in a
RAII type that should live as long as the test wants to validate some
output of the ArgParser.
As noted by ECMA-402, if a supported locale contains all of a language,
script, and region subtag, then the implementation must also support the
locale without the script subtag. The most complicated example of this
is the zh-TW locale.
The list of locales in the CLDR database does not include zh-TW or its
maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale.
However, zh-Hant-TW is listed in the default-content locale list in the
cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We
must then also support the zh-Hant-TW alias without the script subtag:
zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite
heavily tested by test262.
This is a naive implementation based on the symmetry with `asin`.
Before, I'm not really sure what we were doing, but it was returning
wildly incorrect results.
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes#10952.
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
Currently, we get the following results
-1 - -2 = -1
-2 - -1 = 1
Correct would be:
-1 - -2 = 1
-2 - -1 = -1
This was already attempted to be fixed in 7ed8970, but that change was
incorrect. This directly translates to LibJS BigInts having the same
incorrect behavior - it even was tested.
We were passing raw Gfx::Bitmap objects into the various image decoders
instead of encoded image data. This made all of them fail, but the test
expectations were set up in a way that aligned with this outcome.
With this patch, we now test the codecs for real. Except ICO, since we
don't have an ICO file handy. That's a FIXME.
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
This patch introduces table joins. It uses a pretty dumb algorithm-
starting with a singleton '__unity__' row consisting of a single boolean
value, a cartesian product of all tables in the 'FROM' clause is built.
This cartesian product is then filtered through the 'WHERE' clause,
again without any smarts just using brute force.
This patch required a bunch of busy work to allow for example the
ColumnNameExpression having to deal with multiple tables potentially
having columns with the same name.
Because SQL is the craptastic language that it is, sometimes expressions
need to know details about the calling statement. For example the tables
in the 'FROM' clause may be needed to determine which columns are
referenced in 'WHERE' expressions. So the current statement is added
to the ExecutionContext and a new 'execute' overload on Statement is
created which takes the Database and the Statement and builds an
ExecutionContaxt from those.
These are needed to distinguish columns from different tables with the
same column name in one and the same (joined) Tuple. Not quite happy
yet with this API; I think some sort of hierarchical structure would be
better but we'll burn that bridge when we get there :^)
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:
https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml
In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.
This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").
For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
When I added this code in 1472f6d, I forgot to add tests for it. That's
why I didn't realize that the values were appended to the wrong
FormatBuilder object, so an empty string was returned instead of the
expected "nan"/"inf". This made debugging some FPU issues with the
ScummVM port significantly more difficult.
We create a base class called GenericFramebufferDevice, which defines
all the virtual functions that must be implemented by a
FramebufferDevice. Then, we make the VirtIO FramebufferDevice and other
FramebufferDevice implementations inherit from it.
The most important consequence of rearranging the classes is that we now
have one IOCTL method, so all drivers should be committed to not
override the IOCTL method or make their own IOCTLs of FramebufferDevice.
All graphical IOCTLs are known to all FramebufferDevices, and it's up to
the specific implementation whether to support them or discard them (so
we require extensive usage of KResult and KResultOr, together with
virtual characteristic functions).
As a result, the interface is much cleaner and understandable to read.
Also add a test to prevent this from happening again. There were two
bugs:
* The number of bytes just after processing the last value was written,
instead of the number of bytes after skipping remaining whitespace.
Confirmed by testing against GNU's `scanf()` since the man page
leaves something to be desired.
* The number of bytes was written to the wrong variable argument; i.e.
the first argument was overwritten.
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.
Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
To ensure everything works as expected, a unit test was added with
multiple scenarios.
This binary has to have the SetUID flag, and we also bind-mount the
/usr/Tests directory to allow running of SetUID binaries.
The old versions were renamed to JS_DECLARE_OLD_NATIVE_FUNCTION and
JS_DEFINE_OLD_NATIVE_FUNCTION, and will be eventually removed once all
native functions were converted to the new format.
This function converts a single wide character into its multibyte
representation (UTF-8 in our case). It is called from libc++'s
`std::basic_ostream<wchar_t>::flush`, which gets called at program exit
from a global destructor in order to flush `std::wcout`.
Consider the situation where two shared libraries libA and libB, both
depending (as in having a NEEDED dtag) on libC. libA is first
dlopen()-ed, which produces libC to be mapped and linked. When libB is
dlopen()-ed the DynamicLinker would re-map and re-link libC though,
causing any previous references to its old location to be invalid. And
if libA's PLT has been patched to point to libC's symbols, then any
further invocations to libA will cause the code to jump to a virtual
address that isn't mapped anymore, therefore causing a crash. This
situation was reported in #10014, although the setup was more convolved
in the ticket.
This commit fixes the issue by distinguishing between a main program
loading being performed by Loader.so, and a dlopen() call. The main
difference between these two cases is that in the former the
s_globals_objects maps is always empty, while in the latter it might
already contain dependencies for the library being dlopen()-ed. Hence,
when collecting dependencies to map and link, dlopen() should skip those
that are present in the global map to avoid the issue described above.
With this patch the original issue seen in #10014 is gone, with all
python3 modules (so far) loading correctly.
A unit test reproducing a simplified issue is also included in this
commit. The unit test includes the building of two dynamic libraries A
and B with both depending on libline.so (and B also depending on A); the
test then dlopen()s libA, invokes one its function, then does the same
with libB.
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
The callback should be called as soon as the connection is established,
and if we actually set the callback when it already is, we expect it to
be called immediately.
See #10042 for details. In short: qemu doesn't seem to implement that
feature, therefore the test correctly fails. However, that does not help
us, so we skip that test.
This makes it so we don't need to specify the full path to all the
helper scripts we include() from different places in the codebase and
feels a lot cleaner.
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
Command used:
grep -Pirn '(out|warn)ln\((?!["\)]|format,|stderr,|stdout,|output, ")' \
AK Kernel/ Tests/ Userland/
(Plus some manual reviewing.)
Let's pick ArgsParser as an example:
outln(file, m_general_help);
This will fail at runtime if the general help happens to contain braces.
Even if this transformation turns out to be unnecessary in a place or
two, this way the code is "more obviously" correct.
Commit 890c647e0f fixed an off-by-one bug, so the mapping of the page
at the very end of the user address space now works correctly.
This change adjusts the test so cover the corner cases the original
version was designed too.validate.
Previously, LibUnicode would store the values of a keyword as a Vector.
For example, the locale "en-u-ca-abc-def" would have its keyword "ca"
stored as {"abc, "def"}. Then, canonicalization would occur on each of
the elements in that Vector.
This is incorrect because, for example, the keyword value "true" should
only be dropped if that is the entire value. That is, the canonical form
of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change
for canonicalization. However, we would canonicalize that locale as
"en-u-kb-abc".
This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.
Without KVM this test is a clear outlier in runtime during CI:
```
START LibC/TestQsort (106/172)
PASS LibC/TestQsort (41.233692s)
```
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:
auto value = hash_map.get(key).value_or(fallback_value);
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").
To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.