Currently, the unique string lists are stored in the initialized data
sections of their shared libraries. In order to move the data to the
read-only section, generate the strings using RLE arrays.
We generate two arrays: the first is the RLE data itself, the second is
a list of indices into the RLE array for each string. We then generate a
decoding method to convert an RLE string to a StringView.
We are downloading these directly into the build directory now, and
generating the source code from there, so we no longer need the
manually created directory.
While we are at it, remove two variables that seem to be no longer in
use, and at least one of which is confusing regarding a missing prefix.
This cache was disabled in 3127454 because it wasn't needed and there
was a race between the builders for this cache. Then commit 0c95d99
started fuzzing the generated Unicode / TZDB data. Since then, we've
been pulling this data from the live servers instead of Azure's cache.
We do a similar trick for the compiler cache. This allows each builder
to separately push their local data cache (if it changed) while pulling
a shared cache, without the race outlined in commit 3127454. This is
needed for a subsequent commit which will enable this cache for Fuzzer
builds.
This isn't called out in TR-35, but before ICU even looks at CLDR data,
it adds a hard-coded set of default patterns to each locale's calendar.
It has done this since 2006 when its DateTimeFormat feature was first
created. Several test262 tests depend on this, which under ECMA-402,
falls into "implementation defined" behavior. For compatibility, we
can do the same in LibUnicode.
In the generated unique string list, index 0 is the empty string, and is
used to indicate a value doesn't exist in the CLDR. Check for this
before returning an empty calendar symbol.
For example, an upcoming commit will add the fixed day period "noon",
which not all locales support.
Commit ec7d535 only partially handled the case of flexible day periods
rolling over midnight, in that it only worked for hours after midnight.
For example, the en locale defines a day period range of [21:00, 06:00).
The previous method of adding 24 hours to the given hour would change
e.g. 23:00 to 47:00, which isn't valid.
This abstraction layer is mainly for ATA ports (AHCI ports, IDE ports).
The goal is to create a convenient and flexible framework so it's
possible to expand to support other types of controller (e.g. Intel PIIX
and ICH IDE controllers) and to abstract operations that are possible on
each component.
Currently only the ATA IDE code is affected by this, making it much
cleaner and readable - the ATA bus mastering code is moved to the
ATAPort code so more implementations in the near future can take
advantage of such functionality easily.
In addition to that, the hierarchy of the ATA IDE code resembles more of
the SATA AHCI code now, which means the IDEChannel class is solely
responsible for getting interrupts, passing them for further processing
in the ATAPort code to take care of the rest of the handling logic.
This is a start to properly letting us cross-compile Lagom where both
the Tools and the BUILD_LAGOM=ON build are using Lagom CMakeLists.
The initial cut allows an Android build to succeed, more or less.
But there are issues with namespace clashes when using FetchContent with
this approach.
When patterns, grouping digits, symbols, etc. for a requested numbering
system are not found, use the locale's default numbering system. This
will allow using the correct digits e.g. for the locale "en-u-nu-arab"
even though the "en" locale only contains patterns for the "latn"
numbering system.
Replacement conditions for `requires_argument` have been chosen based
on what would be most convenient for implementing an eventual optional
argument mode.
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.
No functional changes.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Error::from_string_literal now takes direct char const*s, while
Error::from_string_view does what Error::from_string_literal used to do:
taking StringViews. This change will remove the need to insert `sv`
after error strings when returning string literal errors once
StringView(char const*) is removed.
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
These are mostly minor mistakes I've encountered while working on the
removal of StringView(char const*). The usage of builder.put_string over
Format<FormatString>::format is preferrable as it will avoid the
indirection altogether when there's no formatting to be done. Similarly,
there is no need to do format(builder, "{}", number) when
builder.put_u64(number) works equally well.
Additionally a few Strings where only constant strings were used are
replaced with StringViews.