0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	becec3578f	LibTimeZone+LibUnicode: Generate string data with run-length encoding Currently, the unique string lists are stored in the initialized data sections of their shared libraries. In order to move the data to the read-only section, generate the strings using RLE arrays. We generate two arrays: the first is the RLE data itself, the second is a list of indices into the RLE array for each string. We then generate a decoding method to convert an RLE string to a StringView.	2022-08-16 16:56:17 +02:00
Timothy Flynn	f8f7015419	LibUnicode: Generate a method to lookup locale-preferred keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	80568d5776	LibUnicode: Generate a method to lookup available keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	c2e5b20eb6	LibUnicode: Generate available values for the keywords co, kf, kn, hc This also ensures we only include values we actually support in the generated list of available values.	2022-07-15 12:31:43 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Timothy Flynn	4868b888be	LibUnicode: Generate per-locale text layout information Currently contains just each locale's character order, but is set up to easily add other text layout fields from the CLDR if ECMA-402 eventually requires them.	2022-07-06 16:56:42 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Timothy Flynn	63c3437274	LibUnicode: Use BCP 47 data to generate available calendars and numbers BCP 47 will be the single source of truth for known calendar and number system keywords, and their aliases (e.g. "gregory" is an alias for "gregorian"). Move the generation of available keywords to where we parse the BCP 47 data, so that hard-coded aliases may be removed from other generators.	2022-02-16 07:23:07 -05:00
Timothy Flynn	89ead8c00a	LibJS+LibUnicode: Parse Unicode keywords from the BCP 47 CLDR package We have a fair amount of hard-coded keywords / aliases that can now be replaced with real data from BCP 47. As a result, the also changes the awkward way we were previously generating keys. Before, we were more or less generating keywords as a CSV list of keys, e.g. for the "nu" key, we'd generate "latn,arab,grek" (ordered by locale preference). Then at runtime, we'd split on the comma. We now just generate spans of keywords directly.	2022-02-16 07:23:07 -05:00
Timothy Flynn	d0fc61e79b	LibUnicode: Extract the BCP 47 package from the CLDR This package was originally meant to be included in CLDR version 40, but was missed in their release scripts. This has been resolved: https://unicode-org.atlassian.net/browse/CLDR-15158 Unfortunately, the CLDR was re-released with the same version number. So to bust the build's CLDR cache, change the "version" used to detect that we need to redownload the CLDR.	2022-02-16 07:23:07 -05:00
Timothy Flynn	a338e9403b	LibUnicode: Port the CLDR locale generator to the stream API This adds a generator utility to read an entire file and parse it as a JSON value. This is heavily used by the CLDR generators. The idea here is to put the file reading details in the utility so that when we have a good story for generically reading an entire stream in LibCore, we can update the generators to use that by only touching this helper.	2022-02-14 11:39:46 -05:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Timothy Flynn	bb0f548614	LibUnicode: Generate a list of available currencies	2022-01-31 00:32:41 +00:00
Timothy Flynn	4d43aeae30	LibUnicode: Fill in case-first and numeric BCP47 keywords Unlike other BCP47 keywords that we are parsing, these only appear in the BCP47 XML file itself within the CLDR. The values are very simple though, so just hard code them until the Unicode org re-releases the CLDR with BCP47: https://unicode-org.atlassian.net/browse/CLDR-15158	2022-01-29 20:27:24 +00:00
Timothy Flynn	bced4e9324	LibJS+LibUnicode: Convert Intl.ListFormat to use Unicode::Style Remove ListFormat's own definition of the Style enum, which was further duplicated by a generated ListPatternStyle enum with the same values.	2022-01-25 19:02:59 +00:00
Timothy Flynn	c86f7a675d	LibUnicode: Do not limit language display names to known locales Currently, the UnicodeLocale generator collects a list of known locales from the CLDR before processing language display names. For each locale, the identifier is broken into language, script, and region subtags, and we create a list of seen languages. When processing display names, we skip languages we hadn't seen in that first step. This is insufficient for language display names like "en-GB", which do not have an locale entry in the CLDR, and thus are skipped. So instead, create the list of known languages by actually reading through the list of languages which have a display name.	2022-01-13 23:05:31 +01:00
Timothy Flynn	91acc2e9c5	LibUnicode: Parse and generate locale display patterns These patterns indicate how to display locale strings when that locale contains multiple subtags. For example, "en-US" would be displayed as "English (United States)".	2022-01-13 23:05:31 +01:00
Timothy Flynn	0d75949827	LibUnicode: Parse and generate locale display names for date fields	2022-01-13 13:43:57 +01:00
Timothy Flynn	7f162c471d	LibUnicode: Parse and generate locale display names for calendars Note there's a bit of an unfortunate duplication in the calendar enum generated by UnicodeLocale and the existing enum generated by UnicodeDateTimeFormat. The former contains every calendar known to the CLDR, whereas the latter contains the calendars we've actually parsed for DateTimeFormat (currently only Gregorian). The new enum generated here can be removed once DateTimeFormat knows about all calendars.	2022-01-13 13:43:57 +01:00
Timothy Flynn	6da1bfeeea	Meta: Support generating case-insensitive value-from-string methods This also extracts the default parameters for generate_value_from_string to a structure. This is just to make it cleaner to add new options.	2022-01-11 00:36:45 +01:00
Timothy Flynn	f576142fe8	LibJS+LibUnicode: Convert UnicodeLocale to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	cf8e11a562	LibUnicode: Add temporary overload of value-from-string generator This is a temporary mechanism while LibUnicode is in an in-between state where some symbols are weakly linked and others are dynamically loaded. The latter require an asm() label to be loaded.	2022-01-04 22:49:43 +00:00
Timothy Flynn	52394deece	LibUnicode: Remove now unused value-from-string generator overload The generate_value_from_string_for_dynamic_loading() overload was just temporary until all generates were switched over to dynamic loading.	2021-12-21 13:09:49 -08:00
Timothy Flynn	09be26b5d2	LibUnicode: Dynamically load the generated UnicodeLocale symbols	2021-12-21 13:09:49 -08:00
Timothy Flynn	ce6c515873	LibUnicode: Generate unique list patterns and lists of list patterns	2021-12-13 21:28:56 -08:00
Timothy Flynn	0ad2decd04	LibUnicode: Generate unique list of keyword values	2021-12-13 21:28:56 -08:00
Timothy Flynn	0c6cc4ad96	LibUnicode: Generate unique lists of localized currencies	2021-12-13 21:28:56 -08:00
Timothy Flynn	a45f2ccc25	LibUnicode: Generate unique lists of languages, territories, and scripts	2021-12-13 21:28:56 -08:00
Timothy Flynn	bf79c73158	LibUnicode: Do not generate data for "generic" calendars This is not a calendar supported by ECMA-402, so let's not waste space with its data. Further, don't generate "gregorian" as a valid Unicode locale extension keyword. It's an invalid type identifier, thus cannot be used in locales such as "en-u-ca-gregorian".	2021-12-01 16:36:26 +00:00
Timothy Flynn	71903ea7e1	LibUnicode: Parse and generate calendar (ca) Unicode keywords Also removes a few fly-by "StringView x = nullptr;" unnecessary initializers.	2021-11-29 22:48:46 +00:00
Timothy Flynn	0aa3e5c2ea	LibUnicode: Port generator utility methods to ErrorOr Most of these were VERIFY-ing for success, but propagating an error message up to serenity_main() is much nicer than just a SIGABRT.	2021-11-23 22:58:05 +01:00
Timothy Flynn	8c5f19f7c8	LibUnicode: Port GenerateUnicodeLocale to ErrorOr and LibMain	2021-11-23 22:58:05 +01:00
Timothy Flynn	93ee922027	LibUnicode: Support locales-without-script aliases for ECMA-402 As noted by ECMA-402, if a supported locale contains all of a language, script, and region subtag, then the implementation must also support the locale without the script subtag. The most complicated example of this is the zh-TW locale. The list of locales in the CLDR database does not include zh-TW or its maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale. However, zh-Hant-TW is listed in the default-content locale list in the cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We must then also support the zh-Hant-TW alias without the script subtag: zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite heavily tested by test262.	2021-11-19 11:45:35 +01:00
Timothy Flynn	a13fa15a30	LibUnicode: Generate default-content locales as aliases Previously, we were just copying the locale data into default-content locales (for example, copying the "en" data into "en-US"). Instead, we can just define the default-content locales as aliases to their main locales.	2021-11-19 11:45:35 +01:00
Andreas Kling	587f9af960	AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional) Also add slightly richer parse errors now that we can include a string literal with returned errors. This will allow us to use TRY() when working with JSON data.	2021-11-17 00:21:10 +01:00
Timothy Flynn	e9493a2cd5	LibUnicode: Ensure UnicodeNumberFormat is aware of default content For example, there isn't a unique set of data for the en-US locale; rather, it defaults to the data for the en locale. See this commit for much more detail: `357c97dfa8`	2021-11-13 11:52:45 +00:00
Timothy Flynn	39e031c4dd	LibJS+LibUnicode: Generate all styles of currency localizations Currently, LibUnicode is only parsing and generating the "long" style of currency display names. However, the CLDR contains "short" and "narrow" forms as well that need to be handled. Parse these, and update LibJS to actually respect the "style" option provided by the user for displaying currencies with Intl.DisplayNames. Note: There are some discrepencies between the engines on how style is handled. In particular, running: new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd') Gives: SpiderMoney: "USD" V8: "US Dollar" LibJS: "$" And running: new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd') Gives: SpiderMonkey: "$" V8: "US Dollar" LibJS: "$" My best guess is V8 isn't handling style, and just returning the long form (which is what LibJS did before this commit). And SpiderMoney can handle some styles, but if they don't have a value for the requested style, they fall back to the canonicalized code passed into of().	2021-11-13 11:52:45 +00:00
Timothy Flynn	1f2ac0ab41	LibUnicode: Move number formatting code generator to UnicodeNumberFormat	2021-11-12 20:46:38 +00:00
Timothy Flynn	04e6b43f05	LibUnicode: Move (soon-to-be) common code out of GenerateUnicodeLocale The data used for number formatting is going to grow quite a bit when the cldr-units package is parsed. To prevent the generated UnicodeLocale file from growing outrageously large, the number formatting data can go into its own file. To prepare for this, move code that will be common between the generators for UnicodeLocale and UnicodeNumberFormat to the utility header.	2021-11-12 20:46:38 +00:00
Timothy Flynn	be69eae651	LibUnicode: Precompute the compact scale of each number formatting rule This will be needed for the ComputeExponentForMagnitude AO for compact formatting, namely step 5b: Let exponent be an implementation- and locale-dependent (ILD) integer by which to scale a number of the given magnitude in compact notation for the current locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	230b133ee3	LibUnicode: Parse number formats into zero/positive/negative patterns A number formatting pattern in the CLDR contains one or two entries, delimited by a semi-colon. Previously, LibUnicode was just storing the entire pattern as one string. This changes the generator to split the pattern on that delimiter and generate the 3 unique patterns expected by ECMA-402. The rules for generating the 3 patterns are as follows: * If the pattern contains 1 entry, it is the zero pattern. The positive pattern is the zero pattern prepended with {plusSign}. The negative pattern is the zero pattern prepended with {minusSign}. * If the pattern contains 2 entries, the first is the zero pattern, and the second is the negative pattern. The positive pattern is the zero pattern prepended with {plusSign}.	2021-11-12 09:17:08 +00:00
Timothy Flynn	1244ebcd4f	LibUnicode: Parse and generate standard accounting formatting rules Also known as "currency-accounting" in some CLDR documentation.	2021-11-12 09:17:08 +00:00
Timothy Flynn	967afc1b84	LibUnicode: Parse and generate standard currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	bffd73e0d4	LibUnicode: Parse and generate standard decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	feb8c22a62	LibUnicode: Parse and generate standard percentage formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	4317a1b552	LibUnicode: Parse and generate compact currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	604a596c90	LibUnicode: Parse and generate compact decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	12b468a588	LibUnicode: Begin parsing and generating locale number systems The number system data in the CLDR contains information on how to format numbers in a locale-dependent manner. Start parsing this data, beginning with numeric symbol strings. For example the symbol NaN maps to "NaN" in the en-US locale, and "非數值" in the zh-Hant locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	d3e83c9934	LibUnicode: Parse alternate default numbering systems Some locales in the CLDR have alternate default numbering systems listed under "defaultNumberingSystem-alt-*", e.g.: "defaultNumberingSystem": "arab", "defaultNumberingSystem-alt-latn": "latn", "otherNumberingSystems": { "native": "arab" }, We were previously only parsing "defaultNumberingSystem" and "otherNumberingSystems". This odd format appears to be an artifact of converting from XML.	2021-11-12 09:17:08 +00:00
Timothy Flynn	ae66188d43	LibUnicode: Capitialize generated identifiers in lieu of full title case This isn't particularly important because this generates code that is quite hidden from outside callers. But when viewing the generated code, it's a bit nicer to read e.g. enum identifiers such as "MinusSign" rather than "Minussign".	2021-11-12 09:17:08 +00:00

1 2

73 commits