0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	04e6b43f05	LibUnicode: Move (soon-to-be) common code out of GenerateUnicodeLocale The data used for number formatting is going to grow quite a bit when the cldr-units package is parsed. To prevent the generated UnicodeLocale file from growing outrageously large, the number formatting data can go into its own file. To prepare for this, move code that will be common between the generators for UnicodeLocale and UnicodeNumberFormat to the utility header.	2021-11-12 20:46:38 +00:00
Timothy Flynn	be69eae651	LibUnicode: Precompute the compact scale of each number formatting rule This will be needed for the ComputeExponentForMagnitude AO for compact formatting, namely step 5b: Let exponent be an implementation- and locale-dependent (ILD) integer by which to scale a number of the given magnitude in compact notation for the current locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	230b133ee3	LibUnicode: Parse number formats into zero/positive/negative patterns A number formatting pattern in the CLDR contains one or two entries, delimited by a semi-colon. Previously, LibUnicode was just storing the entire pattern as one string. This changes the generator to split the pattern on that delimiter and generate the 3 unique patterns expected by ECMA-402. The rules for generating the 3 patterns are as follows: * If the pattern contains 1 entry, it is the zero pattern. The positive pattern is the zero pattern prepended with {plusSign}. The negative pattern is the zero pattern prepended with {minusSign}. * If the pattern contains 2 entries, the first is the zero pattern, and the second is the negative pattern. The positive pattern is the zero pattern prepended with {plusSign}.	2021-11-12 09:17:08 +00:00
Timothy Flynn	1244ebcd4f	LibUnicode: Parse and generate standard accounting formatting rules Also known as "currency-accounting" in some CLDR documentation.	2021-11-12 09:17:08 +00:00
Timothy Flynn	967afc1b84	LibUnicode: Parse and generate standard currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	bffd73e0d4	LibUnicode: Parse and generate standard decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	feb8c22a62	LibUnicode: Parse and generate standard percentage formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	4317a1b552	LibUnicode: Parse and generate compact currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	604a596c90	LibUnicode: Parse and generate compact decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	12b468a588	LibUnicode: Begin parsing and generating locale number systems The number system data in the CLDR contains information on how to format numbers in a locale-dependent manner. Start parsing this data, beginning with numeric symbol strings. For example the symbol NaN maps to "NaN" in the en-US locale, and "非數值" in the zh-Hant locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	d3e83c9934	LibUnicode: Parse alternate default numbering systems Some locales in the CLDR have alternate default numbering systems listed under "defaultNumberingSystem-alt-*", e.g.: "defaultNumberingSystem": "arab", "defaultNumberingSystem-alt-latn": "latn", "otherNumberingSystems": { "native": "arab" }, We were previously only parsing "defaultNumberingSystem" and "otherNumberingSystems". This odd format appears to be an artifact of converting from XML.	2021-11-12 09:17:08 +00:00
Timothy Flynn	ae66188d43	LibUnicode: Capitialize generated identifiers in lieu of full title case This isn't particularly important because this generates code that is quite hidden from outside callers. But when viewing the generated code, it's a bit nicer to read e.g. enum identifiers such as "MinusSign" rather than "Minussign".	2021-11-12 09:17:08 +00:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Timothy Flynn	357c97dfa8	LibUnicode: Parse the CLDR's defaultContent.json locale list This file contains the list of locales which default to their parent locale's values. In the core CLDR dataset, these locales have their own files, but they are empty (except for identity data). For example: https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml In the JSON export, these files are excluded, so we currently are not recognizing these locales just by iterating the locale files. This is a prerequisite for upgrading to CLDR version 40. One of these default-content locales is the popular "en-US" locale, which defaults to "en" values. We were previously inferring the existence of this locale from the "en-US-POSIX" locale (many implementations, including ours, strip variants such as POSIX). However, v40 removes the "en-US-POSIX" locale entirely, meaning that without this change, we wouldn't know that "en-US" exists (we would default to "en"). For more detail on this and other v40 changes, see: https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba	2021-11-09 20:44:52 +01:00
Timothy Flynn	3ad159537e	LibUnicode: Use u16 for unique string indices instead of size_t Typically size_t is used for indices, but we can take advantage of the knowledge that there is approximately only 46K unique strings in the generated UnicodeLocale.cpp file. Therefore, we can get away with using u16 to store indices. There is a VERIFY that will fail if we ever exceed the limits of u16. On x86_64 builds, this reduces libunicode.so from 9.2 MiB to 7.3 MiB. On i686 builds, this reduces libunicode.so from 3.9 MiB to 3.3 MiB. These savings are entirely in the .rodata section of the shared library.	2021-10-15 00:06:18 +01:00
Timothy Flynn	f91d63af83	LibUnicode: Generate enum/alias from-string methods without a HashMap The _from_string() and resolve__alias() generated methods are the last remaining users of HashMap in the LibUnicode generated files (read: the last methods not using compile-time structures). This converts these methods to use an array containing pairs of hash values to the desired lookup value. Because this code generation is the same between GenerateUnicodeData.cpp and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the LibUnicode generators to contain the method that generates the methods.	2021-10-13 16:38:51 +02:00
Timothy Flynn	597379e864	LibUnicode: Generate and use unique locale-related alias strings Almost all of these are already in the unique string list.	2021-10-10 22:21:48 +02:00
Timothy Flynn	acb7bd917f	LibUnicode: Generate and use unique subtag and complex alias strings	2021-10-10 22:21:48 +02:00
Timothy Flynn	3d67f6bd29	LibUnicode: Generate and use unique list-format strings The list-format strings used for Intl.ListFormat are small, but quite heavily duplicated. For example, the string "{0}, {1}" appears 6,519 times. Generate unique strings for this data to avoid duplication.	2021-10-10 22:21:48 +02:00
Timothy Flynn	f9e605397c	LibUnicode: Generate and use a set of unique locale-related strings In the generated UnicodeLocale.cpp file, there are 296,408 strings for localizations of languages, territories, scripts, currencies & keywords. Of these, only 43,848 (14.8%) are actually unique, so there are quite a large number of duplicated strings. This generates a single compile-time array to store these strings. The arrays for the localizations now store an index into this single array rather than duplicating any strings.	2021-10-10 22:21:48 +02:00
Timothy Flynn	3f0095b57a	LibUnicode: Skip unknown languages and territories Some CLDR languages.json / territories.json files contain localizations for some lanuages/territories that are otherwise not present in the CLDR database. We already don't generate anything in UnicodeLocale.cpp for these anomalies, but this will stop us from even storing that data in the generator's memory. This doesn't affect the output of the generator, but will have an effect after an upcoming commit to unique-ify all of the strings in the CLDR.	2021-10-10 22:21:48 +02:00
Timothy Flynn	c8dbcdb0bc	LibUnicode: Do not compare generated file contents before writing This is now covered by unicode_data.cmake after the superbuild changes.	2021-09-30 17:37:57 +01:00
Idan Horowitz	6704961c82	AK: Replace the mutable String::replace API with an immutable version This removes the awkward String::replace API which was the only String API which mutated the String and replaces it with a new immutable version that returns a new String with the replacements applied. This also fixes a couple of UAFs that were caused by the use of this API. As an optimization an equivalent StringView::replace API was also added to remove an unnecessary String allocations in the format of: `String { view }.replace(...);`	2021-09-11 20:36:43 +03:00
Timothy Flynn	b1d4bcf364	LibUnicode: Generate numeric keyword values for each locale This is needed for Intl.NumberFormat's usage of the ResolveLocale AO, where the [[RelevantExtensionKeys]] internal slot will be "nu".	2021-09-11 11:05:50 +01:00
Timothy Flynn	32a2a02489	LibUnicode: Fix typo in listPatterns.json parsing method	2021-09-08 21:08:48 +01:00
Timothy Flynn	4ad2159812	LibUnicode: Remove Unicode locale variants from CLDR path names There's only a couple of cases like this, but there are some locale paths in the CLDR that contain variants. For example, there isn't a en-US path, but there is a en-US-POSIX path. This interferes with the operation to search for locales by name. The algorithm is such that searching for en-US will not result in en-US-POSIX being found. To resolve this, we should remove variants from the locale name.	2021-09-06 23:49:56 +01:00
Timothy Flynn	3f64a14e06	LibUnicode: Parse and generate the Unicode locale list patterns dataset This data informs consumers how to join lists of values. For example, in en-US, the list ["a", "b", "c"] formatted to a string should become "a, b, and c".	2021-09-06 23:49:56 +01:00
Timothy Flynn	9cd986d8c0	LibUnicode: Extract cldr-misc dataset from CLDR database	2021-09-06 23:49:56 +01:00
Timothy Flynn	e6a2ab1202	LibUnicode: Generate an implementation of the Add Likely Subtags method	2021-09-04 13:51:40 +01:00
Timothy Flynn	28ae63177e	LibUnicode: Generate the entire locale likely-subtags dataset The amount of aliases in the likely-subtags dataset is quite large, so this also needed to change the way the data is generated. Otherwise, the compiler would complain about the size of the generated code. Previously, a static method was generated that would effectively parse the dataset into a HashMap of Unicode::LanguageID at runtime. We now perform that parsing at generation-time, and instead generate an Array of a structure similar to Unicode::LanguageID (we cannot use the same structure because it contains String and Optional, which cannot be used at compile-time).	2021-09-04 13:51:40 +01:00
Timothy Flynn	1fbc5dba08	LibUnicode: Generate Unicode locale likely subtag data CLDR contains a set of likely subtag data where, given a locale, you can resolve what is the most likely language, script, or territory of that locale. This data is needed for resolving territory aliases. These aliases might contain multiple territories, and we need to resolve which of those territories is most likely correct for a locale. Note that the likely subtag data is quite huge (a few thousand entries). As an optimization encouraged by the spec, we only generate the smallest subset of this data that we actually need (about 150 entries).	2021-09-01 14:14:47 +01:00
Timothy Flynn	9ae7ac4c87	LibUnicode: Generate complex Unicode locale alias matching Most alias substitutions are "simple", meaning that alias matching is done by examining a single locale subtag. However, there are a handful of "complex" aliases where matching is done by examining multiple subtags. For example, the variant subtag "lojban" causes the locale "art-lojban" to be canonicalized to "jbo", but only when the language subtag is "art" (i.e. this should not occur for the locale "en-lojban"). This generates a method to perform complex alias matching.	2021-09-01 14:14:47 +01:00
Timothy Flynn	9b118f1f06	LibUnicode: Generate Unicode locale alias data CLDR contains a set of aliases for languages, territories, etc. that no longer are meant to be used (e.g. due to deprecation). For example, the language "aam" is deprecated and should be canonicalized as "aas".	2021-09-01 14:14:47 +01:00
Timothy Flynn	caf5b6fa6f	LibUnicode: Extract cldr-core dataset from CLDR database	2021-09-01 14:14:47 +01:00
Andrew Kaster	63956b36d0	Everywhere: Move all host tools into the Lagom/Tools subdirectory This allows us to remove all the add_subdirectory calls from the top level CMakeLists.txt that referred to targets linking LagomCore. Segregating the host tools and Serenity targets helps us get to a place where the main Serenity build can simply use a CMake toolchain file rather than swapping all the compiler/sysroot variables after building host libraries and tools.	2021-08-28 08:44:17 +01:00

35 commits