The generate_mapping helper generates a series of structs like:
Array<SomeType, 1> s_mapping_key_0 {};
Array<SomeType, 2> s_mapping_key_1 {};
Array<SomeType, 3> s_mapping_key_2 {};
Array<Span<SomeType const>> s_mapping { {
s_mapping_key_0.span(),
s_mapping_key_1.span(),
s_mapping_key_2.span(),
} };
Where the names of the struct were generated by the format_mapping_name
lambda inside the helper. Rather than this lambda making assumptions on
how each generator wants to name its structs, add a parameter for the
caller to provide a naming formatter.
This is because the TimeZoneData generator will want pretty specific
identifier formatting rules.
This is a temporary mechanism while LibUnicode is in an in-between state
where some symbols are weakly linked and others are dynamically loaded.
The latter require an asm() label to be loaded.
The generated data for libunicodedata.so is quite large, and loading it
is a price paid by nearly every application by way of depending on
LibRegex. In order to defer this cost until an application actually uses
one of the surrounding APIs, dynamically load the generated symbols.
To be able to load the symbols dynamically, the generated methods must
have demangled names. Typically, this is accomplished with `extern "C"`
blocks. The clang toolchain complains about this here because the types
returned from the generators are strictly C++ types. So to demangle the
names, we use the asm() compiler directive to manually define a symbol
name; the caveat is that we *must* be sure the symbols are unique. As an
extra precaution, we prefix each symbol name with "unicode_". For more
details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html
This symbol loader used in this implementation provides the additional
benefit of removing many [[maybe_unused]] attributes from the LibUnicode
methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the
loader is able to stub out the function pointers it returns.
Note that as of this commit, LibUnicode is still directly linked against
LibUnicodeData. This commit is just a first step towards removing that.
The evolution of UniqueStorage has been as follows:
1. It was created as UniqueStringStorage to ensure only one copy of each
unique string is generated. Interested parties stored an index into
a unique string list, rather than the string itself.
Commits: f9e605397c and 04e6b43f05
2. It became apparent that non-string structures could also be de-
duplicated to reduce the size of libunicode.so. UniqueStringStorage
was generalized to UniqueStorage for this purpose.
Commit: d8e6beb14f
It's now also apparent that there's heavy duplication of lists of
structures. For example, the NumberFormat generator stores 4 lists of
NumberFormat objects. In total, we currently generate nearly 2,000 lists
of these objects, of which 275 are unique.
This change updates UniqueStorage to support storing lists. The only
change is how the storage is generated - we generate each stored list
individually, then an array storing spans of those lists.
UniqueStringStorage is used to ensure only one copy of a string will be
generated, and interested parties store just an index into the generated
storage. Generalize this class to allow any* type to be stored uniquely.
* To actually be storable, the type must have both an AK::Format and an
AK::Traits overload available.
This hasn't mattered yet by chance, because the source for all enums
contains names of the same case. But the enum generated for hour cycle
regions will have mixed case. Sort them case-insensitively in order to
traverse these names in the same order in both generate_enum and
generate_mapping.
Previously, we were just copying the locale data into default-content
locales (for example, copying the "en" data into "en-US"). Instead, we
can just define the default-content locales as aliases to their main
locales.
This will be used for locale aliases as well. Also rename the "property"
field in this struct to "name", as it no longer is only used for
property aliases.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
For example, there isn't a unique set of data for the en-US locale;
rather, it defaults to the data for the en locale. See this commit for
much more detail: 357c97dfa8
The data used for number formatting is going to grow quite a bit when
the cldr-units package is parsed. To prevent the generated UnicodeLocale
file from growing outrageously large, the number formatting data can go
into its own file. To prepare for this, move code that will be common
between the generators for UnicodeLocale and UnicodeNumberFormat to the
utility header.
The *_from_string() and resolve_*_alias() generated methods are the last
remaining users of HashMap in the LibUnicode generated files (read: the
last methods not using compile-time structures). This converts these
methods to use an array containing pairs of hash values to the desired
lookup value.
Because this code generation is the same between GenerateUnicodeData.cpp
and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the
LibUnicode generators to contain the method that generates the methods.