Commit graph

68 commits

Author SHA1 Message Date
Timothy Flynn
89d1813b5d LibUnicode: Move CLDR data generators to a LibLocale subfolder
To prepare for placing all CLDR generated data in a new library,
LibLocale, this moves the code generators for the CLDR data to the
LibLocale subfolder.
2022-09-05 14:37:16 -04:00
Timothy Flynn
becec3578f LibTimeZone+LibUnicode: Generate string data with run-length encoding
Currently, the unique string lists are stored in the initialized data
sections of their shared libraries. In order to move the data to the
read-only section, generate the strings using RLE arrays.

We generate two arrays: the first is the RLE data itself, the second is
a list of indices into the RLE array for each string. We then generate a
decoding method to convert an RLE string to a StringView.
2022-08-16 16:56:17 +02:00
Timothy Flynn
ae2acc8cdf LibJS+LibUnicode: Generate a set of default DateTimeFormat patterns
This isn't called out in TR-35, but before ICU even looks at CLDR data,
it adds a hard-coded set of default patterns to each locale's calendar.
It has done this since 2006 when its DateTimeFormat feature was first
created. Several test262 tests depend on this, which under ECMA-402,
falls into "implementation defined" behavior. For compatibility, we
can do the same in LibUnicode.
2022-07-22 23:51:56 +01:00
Timothy Flynn
32c07bc6c3 LibUnicode: Generate per-locale data for the "noon" fixed day period
Note that not all locales have this day period.
2022-07-21 20:36:03 +01:00
Timothy Flynn
16b673eaa9 LibUnicode: Check whether a calendar symbol for a locale actually exists
In the generated unique string list, index 0 is the empty string, and is
used to indicate a value doesn't exist in the CLDR. Check for this
before returning an empty calendar symbol.

For example, an upcoming commit will add the fixed day period "noon",
which not all locales support.
2022-07-21 20:36:03 +01:00
Timothy Flynn
0f26ab89ae LibJS+LibUnicode: Handle flexible day periods on both sides of midnight
Commit ec7d535 only partially handled the case of flexible day periods
rolling over midnight, in that it only worked for hours after midnight.
For example, the en locale defines a day period range of [21:00, 06:00).
The previous method of adding 24 hours to the given hour would change
e.g. 23:00 to 47:00, which isn't valid.
2022-07-21 20:36:03 +01:00
Timothy Flynn
b24b9c0a65 LibUnicode: Fallback to per-locale default calendars
When patterns, symbols, etc. for a requested calendar are not found, use
the locale's default calendar.
2022-07-15 12:31:43 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
7456904a39 Meta+Userland: Simplify some formatters
These are mostly minor mistakes I've encountered while working on the
removal of StringView(char const*). The usage of builder.put_string over
Format<FormatString>::format is preferrable as it will avoid the
indirection altogether when there's no formatting to be done. Similarly,
there is no need to do format(builder, "{}", number) when
builder.put_u64(number) works equally well.

Additionally a few Strings where only constant strings were used are
replaced with StringViews.
2022-07-12 23:11:35 +02:00
Timothy Flynn
12e7c0808a LibUnicode: Generate per-region week data
This includes:
* The minimum number of days in a week for that week to count as the
  first week of a new year.
* The day to be shown as the first day of the week in a calendar.
* The start/end days of the weekend.

Like the existing hour cycle data, week data is presented per-region in
the CLDR, rather than per-locale. The method to add likely subtags to a
locale to perform region lookups is the same.

The list of regions in the CLDR for hour cycle, minimum days, first day,
and weekend days are quite different. So rather than changing the
existing HourCycleRegion enum to a generic Region enum, we generate
separate enums for each of the week data fields. This allows each lookup
into these fields to remain simple array-based index access, without any
"jumps" for regions that don't have CLDR data for a field.
2022-07-06 16:56:42 +02:00
DexesTTP
7ceeb74535 AK: Use an enum instead of a bool for String::replace(all_occurences)
This commit has no behavior changes.

In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
2022-07-06 11:12:45 +02:00
Timothy Flynn
70ede2825e LibUnicode: Use BCP 47 data to filter valid calendar names 2022-02-16 07:23:07 -05:00
Timothy Flynn
63c3437274 LibUnicode: Use BCP 47 data to generate available calendars and numbers
BCP 47 will be the single source of truth for known calendar and number
system keywords, and their aliases (e.g. "gregory" is an alias for
"gregorian"). Move the generation of available keywords to where we
parse the BCP 47 data, so that hard-coded aliases may be removed from
other generators.
2022-02-16 07:23:07 -05:00
Timothy Flynn
ca3bcf201f LibUnicode: Port the CLDR date format generator to the stream API 2022-02-14 11:39:46 -05:00
Timothy Flynn
6efbafa6e0 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2022-01-31 18:23:22 +00:00
Timothy Flynn
ebd33e580b LibUnicode: Generate a list of available calendars 2022-01-31 00:32:41 +00:00
Timothy Flynn
589e7354fb LibUnicode: Remove extraneous semi-colons at end of generator functions 2022-01-27 21:16:44 +00:00
Timothy Flynn
4400150cd2 LibJS+LibUnicode: Return the appropriate time zone name depending on DST 2022-01-19 21:20:41 +00:00
Timothy Flynn
bf677eb485 LibUnicode: Generate both standard and daylight time zone names
While LibTimeZone didn't support DST, we only generated one of them,
preferring the standard name. Now that DST can be tested, generate both
names.
2022-01-19 21:20:41 +00:00
Timothy Flynn
bdf02c21e1 LibUnicode: Swap the preferred order of standard time zone display names
Our generator is currently preferring the DST variant of the time zone
display names over the non-DST variant. LibTimeZone currently does not
have DST support, and operates in a mode that basically assumes DST does
not exist. Swap the display names for now just to be consistent until we
have DST support.

Note we will need to generate both of these variants and select the
appropriate one at runtime once we have DST support.
2022-01-12 15:43:12 +01:00
Timothy Flynn
e2dfbe8f67 LibUnicode: Parse and generate long and short generic time zone names
This implements the CalendarPatternStyle::{Long,Short}Generic styles of
time zone name formatting.
2022-01-11 23:56:35 +01:00
Timothy Flynn
8d35563f28 LibUnicode: Implement TR-35's localized GMT offset formatting
This adds an API to use LibTimeZone to convert a time zone such as
"America/New_York" to a GMT offset string like "GMT-5" (short form) or
"GMT-05:00" (long form).
2022-01-11 23:56:35 +01:00
Timothy Flynn
b543c3e490 Meta: Don't assume how each generator wants to generate keyed map names
The generate_mapping helper generates a series of structs like:

    Array<SomeType, 1> s_mapping_key_0 {};
    Array<SomeType, 2> s_mapping_key_1 {};
    Array<SomeType, 3> s_mapping_key_2 {};
    Array<Span<SomeType const>> s_mapping { {
        s_mapping_key_0.span(),
        s_mapping_key_1.span(),
        s_mapping_key_2.span(),
    } };

Where the names of the struct were generated by the format_mapping_name
lambda inside the helper. Rather than this lambda making assumptions on
how each generator wants to name its structs, add a parameter for the
caller to provide a naming formatter.

This is because the TimeZoneData generator will want pretty specific
identifier formatting rules.
2022-01-11 00:36:45 +01:00
Timothy Flynn
498b741434 LibUnicode: Use LibTimeZone's list of time zone names
LibUnicode no longer needs to generate a list of time zone names that it
parsed from metaZones.json. We can defer to the TZDB for a golden list
of time zones.
2022-01-08 12:45:34 +01:00
Timothy Flynn
ca9123f66f LibUnicode: Rename DateTimeFormat's generator's TimeZone struct
Before using LibTimeZone within LibUnicode, rename this structure to
avoid naming conflicts with the TimeZone namespace.
2022-01-08 12:45:34 +01:00
Timothy Flynn
6d7d9dd324 LibUnicode: Do not assume time zones & meta zones have a 1-to-1 mapping
The generator parses metaZones.json to form a mapping of meta zones to
time zones (AKA "golden zone" in TR-35). This parser errantly assumed
this was a 1-to-1 mapping.
2022-01-06 22:28:01 +01:00
Timothy Flynn
62d8d1fdfd LibUnicode: Move UTC verification to the scope that requires it
In Unicode::get_time_zone_name(), we don't need to require that the time
zone is UTC for long- and short-style name lookups. This is required for
other styles, because they will depend on TZDB data - so move the VERIFY
to that scope.
2022-01-06 22:28:01 +01:00
Timothy Flynn
ec7d5351ed LibJS+LibUnicode: Handle flexible day periods that roll over midnight
When searching for the locale-specific flexible day period for a given
hour, we were neglecting to handle cases where the period crosses 00:00.
For example, the en locale defines a day period range of [21:00, 06:00).
When given the hour of 05:00, we were checking if (21 <= 5 && 5 < 6),
thus not recognizing that the hour falls in that period.
2022-01-05 16:22:55 +01:00
Timothy Flynn
ba4cdf34f8 LibUnicode: Convert UnicodeDateTimeFormat to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
126a3fe180 LibUnicode: Add minimal support for generic & offset-based time zones
ECMA-402 now supports short-offset, long-offset, short-generic, and
long-generic time zone name formatting. For example, in the en-US locale
the America/Eastern time zone would be formatted as:

    short-offset: GMT-5
    long-offset: GMT-05:00
    short-generic: ET
    long-generic: Eastern Time

We currently only support the UTC time zone, however. Therefore, this
very minimal implementation does not consider GMT offset or generic
display names. Instead, the CLDR defines specific strings for UTC.
2022-01-03 15:11:59 +01:00
Timothy Flynn
15e1498419 LibUnicode: Dynamically load the generated UnicodeDateTimeFormat symbols 2021-12-21 13:09:49 -08:00
Michel Hermier
060e5ccbbc Lagom: Bind time_zone_list_index_type in the generator
The variable `s_time_zone_list_index_type` seems to be unused (detected
when compiling with clang), and it seems logical to bind it even it if
it is not used for now.
2021-12-18 21:01:10 -08:00
Timothy Flynn
6e5f0b139b LibUnicode: Remove unused fields from generated structures
A couple of structures held a string index that is unused. Removing them
also removes the string values from the unique string list.
2021-12-13 21:28:56 -08:00
Timothy Flynn
77fc877c04 LibUnicode: Generate unique lists of hour cycles 2021-12-13 21:28:56 -08:00
Timothy Flynn
6f17696176 LibUnicode: Generate unique lists of time zone structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
df33156462 LibUnicode: Generate unique lists of day period structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
265785e847 LibUnicode: Generate unique day period structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
7af1818e76 LibUnicode: Generate unique time zone structures
Each of the 374 locales contain 156 time zone structures. Of these
58,344 structures, 13,578 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
b14b37f386 LibUnicode: Generate unique calendar structures
Of the 374 generated calendars, 173 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
4b721597d7 LibUnicode: Generate unique lists of calendar range patterns
Of the 374 range pattern lists and 374 range12 pattern lists, 230 are
unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
9fc2442e7d LibUnicode: Generate unique lists of calendar patterns
Of the 374 generated lists, 152 are unique. These lists have upwards of
1000 entries as well, so the de-duplication is particularly nice.
2021-12-13 21:28:56 -08:00
Timothy Flynn
09547f4084 LibUnicode: Generate unique lists of calendar symbols structures
Of the 374 generated lists, 120 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
f681ec9d98 LibUnicode: Generate unique calendar symbols structures
Each of the 374 generated calendars include 4 symbols structures. Of
these 1496 structures, only 386 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
62ff029890 LibUnicode: Generate CalendarSymbols in a predetermined order
Similar to commit 2a7f36b392, this change moves the generated
CalendarSymbol enumeration to the public LibUnicode/NumberFormat.h
header with a pre-defined set of symbols that we need. This is to
prepare for uniquely generating the CalendarSymbols structure.
2021-12-13 21:28:56 -08:00
Timothy Flynn
cf8ef954e5 LibUnicode: Generate unique lists of calendar symbols
Each of the 374 generated calendars include 4 sets of symbols, each of
which have 3 lists of symbols (narrow, short, long). Of these 4488
lists, only 819 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
af7caa97c8 LibUnicode: Generate unique calendar format structures
There are currently 374 calendars generated, each of which include 3
CalendarFormat structures. Of these 1122 instances, only 167 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
a417c23de0 LibUnicode: Parse and generate per-locale day period ranges 2021-12-10 21:27:24 +00:00
Timothy Flynn
fa8e881cfa LibUnicode: Parse and generate secondary day period symbols
Generate morning2, afternoon2, evening2, and night2 symbols.
2021-12-10 21:27:24 +00:00
Timothy Flynn
76aab821f4 LibJS+LibUnicode: Rename some Unicode::DayPeriod values
In the CLDR, there aren't "night" values, there are "night1" & "night2"
values. This is for locales which use a different name for nighttime
depending on the hour. For example, the ja locale uses "夜" between the
hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and
04:00. Our CLDR parser is currently ignoring "night2", so this rename
is to prepare for that.

We could probably come up with better names, but in the end, the API in
LibUnicode will be such that outside callers won't even see Night1, etc.
2021-12-10 21:27:24 +00:00
Timothy Flynn
9d4c4303fd LibUnicode: Parse and generate date time range format patterns 2021-12-09 23:43:04 +00:00