0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	becec3578f	LibTimeZone+LibUnicode: Generate string data with run-length encoding Currently, the unique string lists are stored in the initialized data sections of their shared libraries. In order to move the data to the read-only section, generate the strings using RLE arrays. We generate two arrays: the first is the RLE data itself, the second is a list of indices into the RLE array for each string. We then generate a decoding method to convert an RLE string to a StringView.	2022-08-16 16:56:17 +02:00
Timothy Flynn	b2709f161e	LibUnicode: Generate per-locale approximately & range separator symbols	2022-07-20 22:30:16 +01:00
Timothy Flynn	c849cb9d76	LibUnicode: Fallback to per-locale default numbering systems When patterns, grouping digits, symbols, etc. for a requested numbering system are not found, use the locale's default numbering system. This will allow using the correct digits e.g. for the locale "en-u-nu-arab" even though the "en" locale only contains patterns for the "latn" numbering system.	2022-07-15 12:31:43 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Timothy Flynn	232df4196b	LibUnicode: Replace NumberFormat::Plurality with Unicode::PluralCategory To prepare for using plural rules within number & duration format, this removes the NumberFormat::Plurality enumeration. This also adds PluralCategory::ExactlyZero & PluralCategory::ExactlyOne. These are used in locales like French, where PluralCategory::One really means any value from 0.00 to 1.99. PluralCategory::ExactlyOne means only the value 1, as the name implies. These exact rules are not known by the general plural rules, they are explicitly for number / currency format.	2022-07-08 20:33:52 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Idan Horowitz	f4785e2468	LibUnicode: Generate data about DurationFormat-required units as well	2022-07-01 01:00:05 +03:00
Idan Horowitz	573061e76c	LibUnicode: Extract the timeSeparator numeric symbol from CLDR This will be used by Intl.DurationFormat	2022-07-01 01:00:05 +03:00
Timothy Flynn	066352c9aa	LibJS+LibUnicode: Align ECMA-402 "sanctioned" terminology with UTS 35 This is an editorial change in the Intl spec. See: https://github.com/tc39/ecma402/commit/087995c https://github.com/tc39/ecma402/commit/233d29c This also adds a missing spec link for the sanctioned units and fixes a broken spec link for IsSanctionedSingleUnitIdentifier. In LibUnicode, the NumberFormat generator is updated to use the constexpr helper to retrieve sanctioned units.	2022-03-30 14:24:32 +01:00
Timothy Flynn	71d86261c3	LibUnicode: Use BCP 47 data to filter valid numbering system names There isn't too much of an effective difference here other than that the BCP 47 data contains some aliases we would otherwise not handle.	2022-02-16 07:23:07 -05:00
Timothy Flynn	63c3437274	LibUnicode: Use BCP 47 data to generate available calendars and numbers BCP 47 will be the single source of truth for known calendar and number system keywords, and their aliases (e.g. "gregory" is an alias for "gregorian"). Move the generation of available keywords to where we parse the BCP 47 data, so that hard-coded aliases may be removed from other generators.	2022-02-16 07:23:07 -05:00
Timothy Flynn	f39540876b	LibUnicode: Port the CLDR number format generator to the stream API	2022-02-14 11:39:46 -05:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Timothy Flynn	481ced53d8	LibUnicode: Generate a list of available numbering systems	2022-01-31 00:32:41 +00:00
Timothy Flynn	2d2f713426	LibUnicode: Generate per-locale minimum grouping digit values Previously, we were breaking up digits into groups without regard for the locale's minimumGroupingDigits value in the CLDR. This value is 1 in most locales, but is 2 in locales such as pl-PL. What this means is that in those locales, the group separator should only be inserted if the thousands group has at least 2 digits. So 1000 is formatted as "1,000" in en-US, but "1000" in pl-PL. And 10000 is "10,000" in en-US and "10 000" in pl-PL.	2022-01-27 20:30:52 +00:00
Idan Horowitz	877ae85017	LibJS+LibUnicode: Make static const Utf8View variables constexpr	2022-01-17 14:46:07 +00:00
Timothy Flynn	0d8120eeb2	LibUnicode: Perform number system lookups by enumeration value Now that number systems are generated as an enum, we can generated the number system data in the order of that enum. This lets us perform lookups of that data by index instead of a loop of string comparisons.	2022-01-12 10:49:07 +01:00
Timothy Flynn	c5138f0f2b	LibUnicode: Parse number system digits from the CLDR We had a hard-coded table of number system digits copied from ECMA-402. Turns out these digits are in the CLDR, so let's parse the digits from there instead of hard-coding them.	2022-01-12 10:49:07 +01:00
Timothy Flynn	b543c3e490	Meta: Don't assume how each generator wants to generate keyed map names The generate_mapping helper generates a series of structs like: Array<SomeType, 1> s_mapping_key_0 {}; Array<SomeType, 2> s_mapping_key_1 {}; Array<SomeType, 3> s_mapping_key_2 {}; Array<Span<SomeType const>> s_mapping { { s_mapping_key_0.span(), s_mapping_key_1.span(), s_mapping_key_2.span(), } }; Where the names of the struct were generated by the format_mapping_name lambda inside the helper. Rather than this lambda making assumptions on how each generator wants to name its structs, add a parameter for the caller to provide a naming formatter. This is because the TimeZoneData generator will want pretty specific identifier formatting rules.	2022-01-11 00:36:45 +01:00
Timothy Flynn	98709d9be1	LibUnicode: Convert UnicodeNumberFormat to link with weak symbols Currently, we load the generated Unicode symbols with dlopen at runtime. This is unnecessary as of `565a880ce5`. Applications that want Unicode data now link directly against the shared library holding that data. So the same functionality can be achieved with weak symbols.	2022-01-04 22:49:43 +00:00
Timothy Flynn	a1f0ca59ae	LibUnicode: Dynamically load the generated UnicodeNumberFormat symbols	2021-12-21 13:09:49 -08:00
Timothy Flynn	415763b1b3	LibUnicode: Define traits for a vector of integral/enum types Any generator which defines a unique storage instance for a list of numbers will need this.	2021-12-13 21:28:56 -08:00
Timothy Flynn	1e95e7716b	LibUnicode: Generate unique units	2021-12-11 14:17:47 +00:00
Timothy Flynn	4c2c8b8e33	LibUnicode: Generate unique number systems	2021-12-11 14:17:47 +00:00
Timothy Flynn	2a7f36b392	LibJS+LibUnicode: Generate unique numeric symbol lists There are 443 number system objects generated, each of which held an array of number system symbols. Of those 443 arrays, only 39 are unique. To uniquely store these, this change moves the generated NumericSymbol enumeration to the public LibUnicode/NumberFormat.h header with a pre- defined set of symbols that we need. This is to ensure the generated, unique arrays are created in a known order with known symbols. While it is unfortunate to no longer discover these symbols at generation time, it does allow us to ignore unwanted symbols and perform less string-to- enumeration conversions at lookup time.	2021-12-11 14:17:47 +00:00
Timothy Flynn	9cc323b0b0	LibUnicode: Generate unique NumberFormat lists for each Unit	2021-12-11 14:17:47 +00:00
Timothy Flynn	cdbfe01827	LibUnicode: Generate unique NumberFormat lists for each NumberSystem	2021-12-11 14:17:47 +00:00
Timothy Flynn	945ca81dd7	LibUnicode: Generate unique number format structures Add unique storage for parsed NumberFormat structures to ensure only one copy of each structure is generated. Reduces libunicode.so on x86 from 13.2 MB to 11.4 MB.	2021-12-06 15:46:34 +01:00
Timothy Flynn	914675e826	LibJS+LibUnicode: Separate number formatting methods from Locale.h Currently, we generate separate data files for locale and number format related tables/methods, but provide public accessors for all of the data in one Locale.h file. Rather than continuing this trend for date-time, relative time, etc. formatting, it's a bit easier to reason about if the public accessors are also in separate files.	2021-11-29 22:48:46 +00:00
Timothy Flynn	0aa3e5c2ea	LibUnicode: Port generator utility methods to ErrorOr Most of these were VERIFY-ing for success, but propagating an error message up to serenity_main() is much nicer than just a SIGABRT.	2021-11-23 22:58:05 +01:00
Timothy Flynn	55e0b91d8d	LibUnicode: Port GenerateUnicodeNumberFormat to ErrorOr and LibMain	2021-11-23 22:58:05 +01:00
Timothy Flynn	4b535ce1c8	LibUnicode: Stop passing the cldr-core package to UnicodeNumberFormat This is no longer needed now that this generator isn't parsing the default-content locales.	2021-11-19 11:45:35 +01:00
Timothy Flynn	a13fa15a30	LibUnicode: Generate default-content locales as aliases Previously, we were just copying the locale data into default-content locales (for example, copying the "en" data into "en-US"). Instead, we can just define the default-content locales as aliases to their main locales.	2021-11-19 11:45:35 +01:00
Andreas Kling	587f9af960	AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional) Also add slightly richer parse errors now that we can include a string literal with returned errors. This will allow us to use TRY() when working with JSON data.	2021-11-17 00:21:10 +01:00
Timothy Flynn	cafb717486	LibUnicode: Parse and generate CLDR unit data for Intl.NumberFormat The units data is in another CLDR package, cldr-units.	2021-11-16 23:14:09 +00:00
Timothy Flynn	c24a350a18	LibUnicode: Ignore U+200F when parsing format identifiers Noticed this while implementing multiple identifier support. We were errantly parsing U+200F as a lone identifier in some Hebrew formats.	2021-11-16 23:14:09 +00:00
Timothy Flynn	04b8b87c17	LibJS+LibUnicode: Support multiple identifiers within format pattern This wasn't the case for compact patterns, but unit patterns can contain multiple (up to 2, really) identifiers that must each be recognized by LibJS. Each generated NumberFormat object now stores an array of identifiers parsed. The format pattern itself is encoded with the index into this array for that identifier, e.g. the compact format string "0K" will become "{number}{compactIdentifier:0}".	2021-11-16 23:14:09 +00:00
Timothy Flynn	3b68370212	LibJS+LibUnicode: Rename the generated compact_identifier to identifier This field is currently used to store the StringView into the compact name/symbol in the format string. Units will need to store a similar field, so rename the field to be more generic, and extract the parser for it.	2021-11-16 23:14:09 +00:00
Timothy Flynn	1f546476d5	LibJS+LibUnicode: Fix computation of compact pattern exponents The compact scale of each formatting rule was precomputed in commit: `be69eae651` Using the formula: compact scale = magnitude - pattern scale This computation was off-by-one. For example, consider the format key "10000-count-one", which maps to "00 thousand" in en-US. What we are really after is the exponent that best represents the string "thousand" for values greater than 10000 and less than 100000 (the next format key). We were previously doing: log10(10000) - "00 thousand".count("0") = 2 Which clearly isn't what we want. Instead, if we do: log10(10000) + 1 - "00 thousand".count("0") = 3 We get the correct exponent for each format key for each locale. This commit also renames the generated variable from "compact_scale" to "exponent" to match the terminology used in ECMA-402.	2021-11-16 00:56:55 +00:00
Timothy Flynn	48d5684780	LibUnicode: Parse compact identifiers and replace them with a format key For example, in en-US, the decimal, long compact pattern for numbers between 10,000 and 100,000 is "00 thousand". In that pattern, "thousand" is the compact identifier, and the generated format pattern is now "{number} {compactIdentifier}". This also generates that identifier as its own field in the NumberFormat structure.	2021-11-16 00:56:55 +00:00
Timothy Flynn	30fbb7d9cd	LibUnicode: Parse and generate scientific formatting rules	2021-11-14 17:00:35 +00:00
Timothy Flynn	3645f6a0fc	LibUnicode: Fix typo in percent format parser Just by sheer luck this had no actual effect because the decimal format prefix has the same length as the percent format prefix.	2021-11-14 17:00:35 +00:00
Timothy Flynn	3b7f5af042	LibUnicode: Generate primary and secondary number grouping sizes Most locales have a single grouping size (the number of integer digits to be written before inserting a grouping separator). However some have a primary and secondary size. We parse the primary size as the size used for the least significant integer digits, and the secondary size for the most significant.	2021-11-14 10:35:19 +00:00
Timothy Flynn	c65dea64bd	LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern In order to implement Intl.NumberFormat.prototype.formatToParts, do not replace {currency} keys in the format pattern before ECMA-402 tells us to. Otherwise, the array return by formatToParts will not contain the expected currency key. Early replacement was done to avoid resolving the currency display more than once, as it involves a couple of round trips to search through LibUnicode data. So this adds a non-standard method to NumberFormat to do this resolution and cache the result. Another side effect of this change is that LibUnicode must replace unit format patterns of the form "{0} {1}" during code generation. These were previously skipped during code generation because LibJS would just replace the keys with the currency display at runtime. But now that the currency display injection is delayed, any {0} or {1} keys in the format pattern will cause PartitionNumberPattern to abort.	2021-11-13 19:01:25 +00:00
Timothy Flynn	a701ed52fc	LibJS+LibUnicode: Fully implement currency number formatting Currencies are a bit strange; the layout of currency data in the CLDR is not particularly compatible with what ECMA-402 expects. For example, the currency format in the "en" and "ar" locales for the Latin script are: en: "¤#,##0.00" ar: "¤\u00A0#,##0.00" Note how the "ar" locale has a non-breaking space after the currency symbol (¤), but "en" does not. This does not mean that this space will appear in the "ar"-formatted string, nor does it mean that a space won't appear in the "en"-formatted string. This is a runtime decision based on the currency display chosen by the user ("$" vs. "USD" vs. "US dollar") and other rules in the Unicode TR-35 spec. ECMA-402 shies away from the nuances here with "implementation-defined" steps. LibUnicode will store the data parsed from the CLDR however it is presented; making decisions about spacing, etc. will occur at runtime based on user input.	2021-11-13 11:52:45 +00:00
Timothy Flynn	e9493a2cd5	LibUnicode: Ensure UnicodeNumberFormat is aware of default content For example, there isn't a unique set of data for the en-US locale; rather, it defaults to the data for the en locale. See this commit for much more detail: `357c97dfa8`	2021-11-13 11:52:45 +00:00
Timothy Flynn	9421d5c0cf	LibUnicode: Generate currency unit-pattern number formats These are used when formatting a number as currency with a display option of "name" (e.g. for USD, the name is "US Dollars" in en-US). These patterns appear in the CLDR in a different manner than other number formats that are pluralized. They are of the form "{0} {1}", therefore do not undergo subpattern replacements.	2021-11-13 11:52:45 +00:00
Timothy Flynn	6cfd63e5bd	LibUnicode: Parse numbers in number formats a bit more leniently The parser was previously expecting number sections within a pattern to start with "#", but they may also begin with "0".	2021-11-13 11:52:45 +00:00
Timothy Flynn	1f2ac0ab41	LibUnicode: Move number formatting code generator to UnicodeNumberFormat	2021-11-12 20:46:38 +00:00

49 commits