0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	3f0095b57a	LibUnicode: Skip unknown languages and territories Some CLDR languages.json / territories.json files contain localizations for some lanuages/territories that are otherwise not present in the CLDR database. We already don't generate anything in UnicodeLocale.cpp for these anomalies, but this will stop us from even storing that data in the generator's memory. This doesn't affect the output of the generator, but will have an effect after an upcoming commit to unique-ify all of the strings in the CLDR.	2021-10-10 22:21:48 +02:00
Timothy Flynn	c8dbcdb0bc	LibUnicode: Do not compare generated file contents before writing This is now covered by unicode_data.cmake after the superbuild changes.	2021-09-30 17:37:57 +01:00
Idan Horowitz	6704961c82	AK: Replace the mutable String::replace API with an immutable version This removes the awkward String::replace API which was the only String API which mutated the String and replaces it with a new immutable version that returns a new String with the replacements applied. This also fixes a couple of UAFs that were caused by the use of this API. As an optimization an equivalent StringView::replace API was also added to remove an unnecessary String allocations in the format of: `String { view }.replace(...);`	2021-09-11 20:36:43 +03:00
Timothy Flynn	b1d4bcf364	LibUnicode: Generate numeric keyword values for each locale This is needed for Intl.NumberFormat's usage of the ResolveLocale AO, where the [[RelevantExtensionKeys]] internal slot will be "nu".	2021-09-11 11:05:50 +01:00
Timothy Flynn	32a2a02489	LibUnicode: Fix typo in listPatterns.json parsing method	2021-09-08 21:08:48 +01:00
Timothy Flynn	4ad2159812	LibUnicode: Remove Unicode locale variants from CLDR path names There's only a couple of cases like this, but there are some locale paths in the CLDR that contain variants. For example, there isn't a en-US path, but there is a en-US-POSIX path. This interferes with the operation to search for locales by name. The algorithm is such that searching for en-US will not result in en-US-POSIX being found. To resolve this, we should remove variants from the locale name.	2021-09-06 23:49:56 +01:00
Timothy Flynn	3f64a14e06	LibUnicode: Parse and generate the Unicode locale list patterns dataset This data informs consumers how to join lists of values. For example, in en-US, the list ["a", "b", "c"] formatted to a string should become "a, b, and c".	2021-09-06 23:49:56 +01:00
Timothy Flynn	9cd986d8c0	LibUnicode: Extract cldr-misc dataset from CLDR database	2021-09-06 23:49:56 +01:00
Timothy Flynn	e6a2ab1202	LibUnicode: Generate an implementation of the Add Likely Subtags method	2021-09-04 13:51:40 +01:00
Timothy Flynn	28ae63177e	LibUnicode: Generate the entire locale likely-subtags dataset The amount of aliases in the likely-subtags dataset is quite large, so this also needed to change the way the data is generated. Otherwise, the compiler would complain about the size of the generated code. Previously, a static method was generated that would effectively parse the dataset into a HashMap of Unicode::LanguageID at runtime. We now perform that parsing at generation-time, and instead generate an Array of a structure similar to Unicode::LanguageID (we cannot use the same structure because it contains String and Optional, which cannot be used at compile-time).	2021-09-04 13:51:40 +01:00
Timothy Flynn	1fbc5dba08	LibUnicode: Generate Unicode locale likely subtag data CLDR contains a set of likely subtag data where, given a locale, you can resolve what is the most likely language, script, or territory of that locale. This data is needed for resolving territory aliases. These aliases might contain multiple territories, and we need to resolve which of those territories is most likely correct for a locale. Note that the likely subtag data is quite huge (a few thousand entries). As an optimization encouraged by the spec, we only generate the smallest subset of this data that we actually need (about 150 entries).	2021-09-01 14:14:47 +01:00
Timothy Flynn	9ae7ac4c87	LibUnicode: Generate complex Unicode locale alias matching Most alias substitutions are "simple", meaning that alias matching is done by examining a single locale subtag. However, there are a handful of "complex" aliases where matching is done by examining multiple subtags. For example, the variant subtag "lojban" causes the locale "art-lojban" to be canonicalized to "jbo", but only when the language subtag is "art" (i.e. this should not occur for the locale "en-lojban"). This generates a method to perform complex alias matching.	2021-09-01 14:14:47 +01:00
Timothy Flynn	9b118f1f06	LibUnicode: Generate Unicode locale alias data CLDR contains a set of aliases for languages, territories, etc. that no longer are meant to be used (e.g. due to deprecation). For example, the language "aam" is deprecated and should be canonicalized as "aas".	2021-09-01 14:14:47 +01:00
Timothy Flynn	caf5b6fa6f	LibUnicode: Extract cldr-core dataset from CLDR database	2021-09-01 14:14:47 +01:00
Andrew Kaster	63956b36d0	Everywhere: Move all host tools into the Lagom/Tools subdirectory This allows us to remove all the add_subdirectory calls from the top level CMakeLists.txt that referred to targets linking LagomCore. Segregating the host tools and Serenity targets helps us get to a place where the main Serenity build can simply use a CMake toolchain file rather than swapping all the compiler/sysroot variables after building host libraries and tools.	2021-08-28 08:44:17 +01:00

15 commits