0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	b7ef36aa36	LibUnicode: Parse and generate custom emoji added for SerenityOS Parse emoji from emoji-serenity.txt to allow displaying their names and grouping them together in the EmojiInputDialog. This also adds an "Unknown" value to the EmojiGroup enum. This will be useful for emoji that aren't found in the UCD, or for when UCD downloads are disabled.	2022-09-11 20:33:57 +01:00
Timothy Flynn	0aadd4869d	LibUnicode: Generate emoji data for non-fully-qualified emoji This allows us to find emoji data for files such as /res/emoji/U+A9.png. U+00A9 is not fully-qualified (its full form is U+00A9 U+FE0F). But the UCD has unqualified data for this code point; generating it allows us to categorize these emoji appropriately in the EmojiInputDialog.	2022-09-11 20:33:57 +01:00
Timothy Flynn	b61eca0a1e	LibUncode: Parse and generate emoji code point data According to TR #51, the "best definition of the full set [of emojis] is in the emoji-test.txt file". This defines not only the emoji themselves, but the order in which they should be displayed, and what "group" of emojis they belong to.	2022-09-08 23:12:31 +01:00
Timothy Flynn	f082b6ae48	LibUnicode: Generate a separate Locale enumeration for special casing The UCD only cares about a few locales for special casing rules (az, lt, and tr). Unfortunately, LibUnicode cannot use LibLocale once the libraries are separate because LibLocale will need to use LibUnicode for many more things; thus there would be a circular dependency. Instead, just generate the small enum needed for this one use case.	2022-09-05 14:37:16 -04:00
Timothy Flynn	43a3471298	LibLocale: Move locale source files to the LibLocale folder These are still included in LibUnicode, but this updates their location and the include paths of other files which include them.	2022-09-05 14:37:16 -04:00
Timothy Flynn	ff48220dca	Userland: Move files destined for LibLocale to the Locale namespace	2022-09-05 14:37:16 -04:00
Timothy Flynn	1e0276f541	LibLocale+LibUnicode: Move generated CLDR data files to LibLocale folder They are still included into LibUnicode, but this moves their generated location to be under LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	89d1813b5d	LibUnicode: Move CLDR data generators to a LibLocale subfolder To prepare for placing all CLDR generated data in a new library, LibLocale, this moves the code generators for the CLDR data to the LibLocale subfolder.	2022-09-05 14:37:16 -04:00
davidot	cd763de280	LibJS+LibUnicode: Move some constant arrays to a separate header Since LibUnicode depends on this data it used to include Intl/AbstractOperations which in turn includes a number of other LibJS headers. By moving this to its own header with minimal includes we can save on rebuilding LibUnicode for unrelated LibJS header changes.	2022-08-27 10:55:44 -04:00
Timothy Flynn	ca92e37ae0	LibUnicode: Generate code point display names with run-length encoding Similar to commit `becec35`, our code point display name data was a large list of StringViews. RLE can be used here as well to remove about 32 MB from the initialized data section to the read-only section. Some of the refactoring to store strings as indices into an RLE array also lets us clean up some of the code point name generators.	2022-08-17 15:42:12 +01:00
Timothy Flynn	2c2ede8581	LibUnicode: Mark UniqueStringStorage::generate as constant This is just to allow it to be invoked from callers who hold a constant UniqueStringStorage instance.	2022-08-17 15:42:12 +01:00
Timothy Flynn	becec3578f	LibTimeZone+LibUnicode: Generate string data with run-length encoding Currently, the unique string lists are stored in the initialized data sections of their shared libraries. In order to move the data to the read-only section, generate the strings using RLE arrays. We generate two arrays: the first is the RLE data itself, the second is a list of indices into the RLE array for each string. We then generate a decoding method to convert an RLE string to a StringView.	2022-08-16 16:56:17 +02:00
Timothy Flynn	ae2acc8cdf	LibJS+LibUnicode: Generate a set of default DateTimeFormat patterns This isn't called out in TR-35, but before ICU even looks at CLDR data, it adds a hard-coded set of default patterns to each locale's calendar. It has done this since 2006 when its DateTimeFormat feature was first created. Several test262 tests depend on this, which under ECMA-402, falls into "implementation defined" behavior. For compatibility, we can do the same in LibUnicode.	2022-07-22 23:51:56 +01:00
Timothy Flynn	32c07bc6c3	LibUnicode: Generate per-locale data for the "noon" fixed day period Note that not all locales have this day period.	2022-07-21 20:36:03 +01:00
Timothy Flynn	16b673eaa9	LibUnicode: Check whether a calendar symbol for a locale actually exists In the generated unique string list, index 0 is the empty string, and is used to indicate a value doesn't exist in the CLDR. Check for this before returning an empty calendar symbol. For example, an upcoming commit will add the fixed day period "noon", which not all locales support.	2022-07-21 20:36:03 +01:00
Timothy Flynn	0f26ab89ae	LibJS+LibUnicode: Handle flexible day periods on both sides of midnight Commit `ec7d535` only partially handled the case of flexible day periods rolling over midnight, in that it only worked for hours after midnight. For example, the en locale defines a day period range of [21:00, 06:00). The previous method of adding 24 hours to the given hour would change e.g. 23:00 to 47:00, which isn't valid.	2022-07-21 20:36:03 +01:00
Timothy Flynn	b2709f161e	LibUnicode: Generate per-locale approximately & range separator symbols	2022-07-20 22:30:16 +01:00
Timothy Flynn	b24b9c0a65	LibUnicode: Fallback to per-locale default calendars When patterns, symbols, etc. for a requested calendar are not found, use the locale's default calendar.	2022-07-15 12:31:43 +02:00
Timothy Flynn	c849cb9d76	LibUnicode: Fallback to per-locale default numbering systems When patterns, grouping digits, symbols, etc. for a requested numbering system are not found, use the locale's default numbering system. This will allow using the correct digits e.g. for the locale "en-u-nu-arab" even though the "en" locale only contains patterns for the "latn" numbering system.	2022-07-15 12:31:43 +02:00
Timothy Flynn	f8f7015419	LibUnicode: Generate a method to lookup locale-preferred keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	80568d5776	LibUnicode: Generate a method to lookup available keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	c2e5b20eb6	LibUnicode: Generate available values for the keywords co, kf, kn, hc This also ensures we only include values we actually support in the generated list of available values.	2022-07-15 12:31:43 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	e5f09ea170	Everywhere: Split Error::from_string_literal and Error::from_string_view Error::from_string_literal now takes direct char consts, while Error::from_string_view does what Error::from_string_literal used to do: taking StringViews. This change will remove the need to insert `sv` after error strings when returning string literal errors once StringView(char const) is removed. No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	7456904a39	Meta+Userland: Simplify some formatters These are mostly minor mistakes I've encountered while working on the removal of StringView(char const*). The usage of builder.put_string over Format<FormatString>::format is preferrable as it will avoid the indirection altogether when there's no formatting to be done. Similarly, there is no need to do format(builder, "{}", number) when builder.put_u64(number) works equally well. Additionally a few Strings where only constant strings were used are replaced with StringViews.	2022-07-12 23:11:35 +02:00
Timothy Flynn	a337b059dd	LibUnicode: Parse and generate per-locale plural ranges	2022-07-12 00:43:34 +01:00
Timothy Flynn	232df4196b	LibUnicode: Replace NumberFormat::Plurality with Unicode::PluralCategory To prepare for using plural rules within number & duration format, this removes the NumberFormat::Plurality enumeration. This also adds PluralCategory::ExactlyZero & PluralCategory::ExactlyOne. These are used in locales like French, where PluralCategory::One really means any value from 0.00 to 1.99. PluralCategory::ExactlyOne means only the value 1, as the name implies. These exact rules are not known by the general plural rules, they are explicitly for number / currency format.	2022-07-08 20:33:52 +02:00
Timothy Flynn	cc5c707649	LibJS+LibUnicode: Do not generate the PluralCategory enum The PluralCategory enum is currently generated for plural rules. Instead of generating it, this moves the enum to the public LibUnicode header. While it was nice to auto-discover these values, they are well defined by TR-35, and we will need their values from within the number format code generator (which can't rely on the plural rules generator having run yet). Further, number format will require additional values in the enum that plural rules doesn't know about.	2022-07-08 20:33:52 +02:00
Timothy Flynn	bf85bf2a9e	LibJS: Use Intl.PluralRules within Intl.RelativeFormat The Polish test cases added here cover previous failures from test262, due to the way that 0 is specified to be "many" in Polish.	2022-07-08 11:51:54 +02:00
Timothy Flynn	8aeacccd82	LibUnicode: Generate a list of available plural categories per locale Separate lists are generated for cardinal and ordinal form.	2022-07-08 11:51:54 +02:00
Timothy Flynn	ea78bac36d	LibUnicode: Parse and generate per-locale plural rules from the CLDR Plural rules in the CLDR are of the form: "cs": { "pluralRule-count-one": "i = 1 and v = 0 @integer 1", "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4", "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...", "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..." } The syntax is described here: https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax There are up to 2 sets of rules for each locale, a cardinal set and an ordinal set. The approach here is to generate a C++ function for each set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is transpiled to a C++ if-statement within its function. Then lookup tables are generated to match locales to their generated functions. NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile flags because the generated plural rules have lots of extra parentheses (because e.g. we need to selectively negate and combine rules). The code to generate only exactly the right number of parentheses is quite hairy, so this just tells the compiler to ignore the extras.	2022-07-08 11:51:54 +02:00
Timothy Flynn	12e7c0808a	LibUnicode: Generate per-region week data This includes: * The minimum number of days in a week for that week to count as the first week of a new year. * The day to be shown as the first day of the week in a calendar. * The start/end days of the weekend. Like the existing hour cycle data, week data is presented per-region in the CLDR, rather than per-locale. The method to add likely subtags to a locale to perform region lookups is the same. The list of regions in the CLDR for hour cycle, minimum days, first day, and weekend days are quite different. So rather than changing the existing HourCycleRegion enum to a generic Region enum, we generate separate enums for each of the week data fields. This allows each lookup into these fields to remain simple array-based index access, without any "jumps" for regions that don't have CLDR data for a field.	2022-07-06 16:56:42 +02:00
Timothy Flynn	4868b888be	LibUnicode: Generate per-locale text layout information Currently contains just each locale's character order, but is set up to easily add other text layout fields from the CLDR if ECMA-402 eventually requires them.	2022-07-06 16:56:42 +02:00
Andrew Kaster	2b29e611fe	Meta: Rename Lagom library target names from LagomFoo to LibFoo This matches the target names for the main serenity build, and will make simplifying the Lagom build much easier going forward. The LagomFoo name came from a time when we had both library builds in the same CMake generated project and needed to deconflict the names.	2022-07-06 14:24:23 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Idan Horowitz	f4785e2468	LibUnicode: Generate data about DurationFormat-required units as well	2022-07-01 01:00:05 +03:00
Idan Horowitz	573061e76c	LibUnicode: Extract the timeSeparator numeric symbol from CLDR This will be used by Intl.DurationFormat	2022-07-01 01:00:05 +03:00
Sam Atkins	d564cf1e89	LibCore+Everywhere: Make Core::Stream read_line() return StringView Similar reasoning to making Core::Stream::read() return Bytes, except that every user of read_line() creates a StringView from the result, so let's just return one right away.	2022-04-16 13:27:51 -04:00
Sam Atkins	3b1e063d30	LibCore+Everywhere: Make Core::Stream::read() return Bytes A mistake I've repeatedly made is along these lines: ```c++ auto nread = TRY(source_file->read(buffer)); TRY(destination_file->write(buffer)); ``` It's a little clunky to have to create a Bytes or StringView from the buffer's data pointer and the nread, and easy to forget and just use the buffer. So, this patch changes the read() function to return a Bytes of the data that were just read. The other read_foo() methods will be modified in the same way in subsequent commits. Fixes #13687	2022-04-16 13:27:51 -04:00
Timothy Flynn	066352c9aa	LibJS+LibUnicode: Align ECMA-402 "sanctioned" terminology with UTS 35 This is an editorial change in the Intl spec. See: https://github.com/tc39/ecma402/commit/087995c https://github.com/tc39/ecma402/commit/233d29c This also adds a missing spec link for the sanctioned units and fixes a broken spec link for IsSanctionedSingleUnitIdentifier. In LibUnicode, the NumberFormat generator is updated to use the constexpr helper to retrieve sanctioned units.	2022-03-30 14:24:32 +01:00
Timothy Flynn	70ede2825e	LibUnicode: Use BCP 47 data to filter valid calendar names	2022-02-16 07:23:07 -05:00
Timothy Flynn	71d86261c3	LibUnicode: Use BCP 47 data to filter valid numbering system names There isn't too much of an effective difference here other than that the BCP 47 data contains some aliases we would otherwise not handle.	2022-02-16 07:23:07 -05:00
Timothy Flynn	63c3437274	LibUnicode: Use BCP 47 data to generate available calendars and numbers BCP 47 will be the single source of truth for known calendar and number system keywords, and their aliases (e.g. "gregory" is an alias for "gregorian"). Move the generation of available keywords to where we parse the BCP 47 data, so that hard-coded aliases may be removed from other generators.	2022-02-16 07:23:07 -05:00
Timothy Flynn	89ead8c00a	LibJS+LibUnicode: Parse Unicode keywords from the BCP 47 CLDR package We have a fair amount of hard-coded keywords / aliases that can now be replaced with real data from BCP 47. As a result, the also changes the awkward way we were previously generating keys. Before, we were more or less generating keywords as a CSV list of keys, e.g. for the "nu" key, we'd generate "latn,arab,grek" (ordered by locale preference). Then at runtime, we'd split on the comma. We now just generate spans of keywords directly.	2022-02-16 07:23:07 -05:00
Timothy Flynn	d0fc61e79b	LibUnicode: Extract the BCP 47 package from the CLDR This package was originally meant to be included in CLDR version 40, but was missed in their release scripts. This has been resolved: https://unicode-org.atlassian.net/browse/CLDR-15158 Unfortunately, the CLDR was re-released with the same version number. So to bust the build's CLDR cache, change the "version" used to detect that we need to redownload the CLDR.	2022-02-16 07:23:07 -05:00
thankyouverycool	0505e031f1	Meta+LibUnicode: Download and parse Unicode block properties This parses Blocks.txt for CharacterType properties and creates a global display array for use in apps.	2022-02-15 10:13:19 -05:00
Timothy Flynn	b52e592eac	LibUnicode: Port the CLDR time format generator to the stream API	2022-02-14 11:39:46 -05:00
Timothy Flynn	ca3bcf201f	LibUnicode: Port the CLDR date format generator to the stream API	2022-02-14 11:39:46 -05:00
Timothy Flynn	f39540876b	LibUnicode: Port the CLDR number format generator to the stream API	2022-02-14 11:39:46 -05:00
Timothy Flynn	a338e9403b	LibUnicode: Port the CLDR locale generator to the stream API This adds a generator utility to read an entire file and parse it as a JSON value. This is heavily used by the CLDR generators. The idea here is to put the file reading details in the utility so that when we have a good story for generically reading an entire stream in LibCore, we can update the generators to use that by only touching this helper.	2022-02-14 11:39:46 -05:00

1 2 3 4 5

221 commits