beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-22 15:40:19 +00:00

Author	SHA1	Message	Date
Timothy Flynn	9220a89d2f	CI+LibUnicode: Remove the UCD from the system	2024-06-22 14:56:39 +02:00
Timothy Flynn	069bed5d47	LibUnicode+LibGfx: Remove superfluous emoji metadata For SerenityOS, we parse emoji metadata from the UCD to learn emoji groups, subgroups, names, etc. We used this information only in the emoji picker dialog. It is entirely unused within Ladybird. This removes our dependence on the UCD emoji file, as we no longer need any of its information. All we need to know is the file path to our custom emoji, which we get from Meta/emoji-file-list.txt.	2024-06-22 14:56:39 +02:00
Timothy Flynn	aa3a30870b	LibUnicode: Replace code point bidirectional classes with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	e77dafc987	LibUnicode: Replace code point scripts and script extensions with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	986ff984cc	LibUnicode: Replace code point general categories with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	c804bda5fd	LibUnicode: Replace code point properties with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	ab56b8c8dc	LibUnicode: Remove the locale-unaware text segmentation implementation	2024-06-20 13:46:54 +02:00
Timothy Flynn	5cf818e305	LibUnicode: Replace case transformations and comparison with ICUs There are a couple of differences here due to using ICU: 1. Titlecasing behaves slightly differently. We previously transformed "123dollars" to "123Dollars", as we would use word segmentation to split a string into words, then transform the first cased character to titlecase. ICU doesn't go quite that far, and leaves the string as "123dollars". While this is a behavior change, the only user of this API is the `text-transform: capitalize;` CSS rule, and we now match the behavior of other browsers. 2. There isn't an API to compare strings with case insensitivity without allocating case-folded strings for both the left- and right-hand-side strings. Our implementation was previously allocation-free; however, in a benchmark, ICU is still ~1.4x faster.	2024-06-20 10:59:55 +02:00
Timothy Flynn	8d7216f4e0	LibUnicode: Replace IDNA ASCII conversion with ICU	2024-06-18 21:07:56 +02:00
Timothy Flynn	1feef17bf7	LibUnicode: Remove completely unused code point name & block name data These were used for e.g. the Character Map on Serenity, but are not used at all for Ladybird.	2024-06-18 21:07:56 +02:00
Timothy Flynn	4ce7f49f2f	Meta: Use SHA-256 verification for downloaded UCD files	2024-05-24 08:47:26 -04:00
Timothy Flynn	91cd43a7ac	Meta: Add a file containing a list of all emoji file names And add a verification step to the emoji data generator to ensure all emoji are listed in this file. This file will be used as a sources list in both the CMake and GN build systems. It is probably possible to generate this list. But in a first attempt, the CMake code to set the file as a dependency of a pseudo target, which would then parse the file and install the listed emoji was getting quite verbose and complicated. So for now, let's just maintain this list.	2024-03-23 17:26:31 -04:00
Andrew Kaster	e0f990f1cb	CMake: Don't download IDNA files when ENABLE_NETWORK_DOWNLOAD is OFF Also tweak the debug message for the Emoji test file.	2023-12-13 10:51:27 -07:00
Simon Wanner	7d9fe44039	LibUnicode: Download and parse IDNA data	2023-12-10 08:04:58 -05:00
Timothy Flynn	139c575cc9	LibUnicode: Update to Unicode version 15.1.0 https://unicode.org/versions/Unicode15.1.0/ This update includes a new set of code point properties, Indic Conjunct Break. These may have the values Consonant, Linker, or Extend. These are used in text segmentation to prevent breaking on some extended grapheme cluster sequences.	2023-09-15 18:30:26 +02:00
Andrew Kaster	941a9846a3	Meta: Assume files already extracted for ENABLE_NETWORK_DOWNLOADS=OFF This allows external meta build systems to extract a cached archive into our Cache directory without having to also copy the .tar.gz file.	2023-08-10 20:10:05 -06:00
Timothy Flynn	fd1fbad1d2	LibGfx+LibUnicode: Support specifying the path to search for emoji Similar to the FontDatabase, this will be needed for Ladybird to find emoji images. We now generate just the file name of emoji image in LibUnicode, and look for that file in the specified path (defaulting to /res/emoji) at runtime.	2023-03-01 14:54:16 +00:00
Timothy Flynn	8c38d46c1a	LibUnicode: Generate the path to emoji images alongside emoji data This will provide for quicker emoji lookups, rather than having to discover and allocate these paths at runtime before we find out if they even exist.	2023-02-24 19:48:47 +01:00
Timothy Flynn	8f2589b3b0	LibUnicode: Parse and generate case folding code point data Case folding rules have a similar mapping style as special casing rules, where one code point may map to zero or more case folding rules. These will be used for case-insensitive string comparisons. To see how case folding can differ from other casing rules, consider "ß" (U+00DF): >>> "ß".lower() 'ß' >>> "ß".upper() 'SS' >>> "ß".title() 'Ss' >>> "ß".casefold() 'ss'	2023-01-18 14:43:40 +00:00
Timothy Flynn	2334b4cebd	Meta: Move UCD/CLDR/TZDB downloaded artifacts to Build/caches They currently reside under Build/<arch>, meaning that they would be redownloaded for each architecture/toolchain build combo. Move them to a location that can be re-used for all builds.	2022-12-24 09:46:28 -05:00
Timothy Flynn	ed84a6f6ee	LibUnicode: Use www.unicode.org domain to download emoji-test.txt The non-www domain does not appear to be available now. We use the www domain for UCD.zip already. Co-authored-by: Stephan Unverwerth <s.unverwerth@serenityos.org>	2022-12-20 12:09:16 -05:00
Timothy Flynn	bd592480e4	Meta: Replace Bash script for generating emoji.txt with C++ generator We currently have two build-time parsers for the UCD's emoji-test.txt file. To prepare for future changes, this removes the Bash parser and moves its functionality to the newer C++ parser.	2022-10-27 12:59:56 +02:00
Timothy Flynn	9fad23018a	Meta: Remove unused "prefix" variable from invoke_generator() helper This became unused after `1ae0cfd`.	2022-10-16 21:16:48 +02:00
Andrew Kaster	1ae0cfd08b	CMake+Userland: Use CMakeLists from Userland to build Lagom Libraries Also do this for Shell. This greatly simplifies the CMakeLists in Lagom, replacing many glob patterns with a big list of libraries. There are still a few special libraries that need some help to conform to the pattern, like LibELF and LibWebView. It also lets us remove essentially all of the Serenity or Lagom binary directory detection logic from code generators, as now both projects directories enter the generator logic from the same place.	2022-10-16 16:36:39 +02:00
Timothy Flynn	51854e345a	LibUnicode: Update to Unicode version 15.0.0 https://unicode.org/versions/Unicode15.0.0/	2022-09-21 14:04:22 +01:00
Timothy Flynn	b7ef36aa36	LibUnicode: Parse and generate custom emoji added for SerenityOS Parse emoji from emoji-serenity.txt to allow displaying their names and grouping them together in the EmojiInputDialog. This also adds an "Unknown" value to the EmojiGroup enum. This will be useful for emoji that aren't found in the UCD, or for when UCD downloads are disabled.	2022-09-11 20:33:57 +01:00
Timothy Flynn	b61eca0a1e	LibUncode: Parse and generate emoji code point data According to TR #51, the "best definition of the full set [of emojis] is in the emoji-test.txt file". This defines not only the emoji themselves, but the order in which they should be displayed, and what "group" of emojis they belong to.	2022-09-08 23:12:31 +01:00
Timothy Flynn	89d1813b5d	LibUnicode: Move CLDR data generators to a LibLocale subfolder To prepare for placing all CLDR generated data in a new library, LibLocale, this moves the code generators for the CLDR data to the LibLocale subfolder.	2022-09-05 14:37:16 -04:00
Tim Schumacher	854792c340	Meta: Don't generate emoji.txt into the source tree	2022-09-05 09:50:31 -04:00
Timothy Flynn	2e0b20ef01	Meta: Only run the emoji generator for Serenity builds It is not needed on Lagom, and was incidentally run twice.	2022-08-23 19:03:43 +01:00
Timothy Flynn	6dd8161002	Meta: Ensure the emoji generator depends on its own script If the script changes, it better be re-run.	2022-08-23 19:03:43 +01:00
Timothy Flynn	d86b25c460	Meta: Move downloading of emoji-test.txt to unicode_data.cmake The current emoji_txt.cmake does not handle download errors (which were a common source of issues in the build problems channel) or Unicode versioning. These are both handled by unicode_data.cmake. Move the download to unicode_data.cmake so that we can more easily handle next month's Unicode 15 release.	2022-08-22 16:00:29 +01:00
Timothy Flynn	ea78bac36d	LibUnicode: Parse and generate per-locale plural rules from the CLDR Plural rules in the CLDR are of the form: "cs": { "pluralRule-count-one": "i = 1 and v = 0 @integer 1", "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4", "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...", "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..." } The syntax is described here: https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax There are up to 2 sets of rules for each locale, a cardinal set and an ordinal set. The approach here is to generate a C++ function for each set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is transpiled to a C++ if-statement within its function. Then lookup tables are generated to match locales to their generated functions. NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile flags because the generated plural rules have lots of extra parentheses (because e.g. we need to selectively negate and combine rules). The code to generate only exactly the right number of parentheses is quite hairy, so this just tells the compiler to ignore the extras.	2022-07-08 11:51:54 +02:00
Timothy Flynn	1f2542247f	LibUnicode: Upgrade to CLDR version 41.0.0 Release notes: https://cldr.unicode.org/index/downloads/cldr-41 Note that the HourCycleRegion enum now contains 272 entires, thus needs to be bumped from u8 to u16.	2022-04-07 08:29:10 -04:00
Timothy Flynn	8a46794ff8	LibUnicode: Replace individual UCD file downloads with single UCD.zip Instead of downloading nearly 20 files individually, we can download a single .zip file similar to how we download a single CLDR .zip. This is to reduce the number of connections/downloads to/from unicode.org.	2022-04-06 17:12:08 -07:00
Brian Gianforcaro	66e7ac1954	Meta: Error out on find_program errors with CMake less than 3.18 We have seen some cases where the build fails for folks, and they are missing unzip/tar/gzip etc. We can catch some of these in CMake itself, so lets make sure to handle that uniformly across the build system. The REQUIRED flag to `find_program` was only added on in CMake 3.18 and above, so we can't rely on that to actually halt the program execution.	2022-03-19 15:01:22 -07:00
Timothy Flynn	d0fc61e79b	LibUnicode: Extract the BCP 47 package from the CLDR This package was originally meant to be included in CLDR version 40, but was missed in their release scripts. This has been resolved: https://unicode-org.atlassian.net/browse/CLDR-15158 Unfortunately, the CLDR was re-released with the same version number. So to bust the build's CLDR cache, change the "version" used to detect that we need to redownload the CLDR.	2022-02-16 07:23:07 -05:00
thankyouverycool	0505e031f1	Meta+LibUnicode: Download and parse Unicode block properties This parses Blocks.txt for CharacterType properties and creates a global display array for use in apps.	2022-02-15 10:13:19 -05:00
Idan Horowitz	2d50c08f34	LibUnicode: Download and parse {Grapheme,Word,Sentence} break props	2022-01-31 21:05:04 +02:00
Timothy Flynn	27eda77c97	LibUnicode: Create a nearly empty generator for relative-time formatting This sets up the generator plumbing to create the relative-time data files. This data could probably be included in the date-time generator, but that generator is large enough that I'd rather put this tangentially related data in its own file.	2022-01-27 21:16:44 +00:00
Timothy Flynn	c7ef86f5d9	Meta: Download UCD and CLDR data with fallible download function	2022-01-26 00:22:53 +00:00
Timothy Flynn	c5138f0f2b	LibUnicode: Parse number system digits from the CLDR We had a hard-coded table of number system digits copied from ECMA-402. Turns out these digits are in the CLDR, so let's parse the digits from there instead of hard-coding them.	2022-01-12 10:49:07 +01:00
Timothy Flynn	9ba386a7bb	Meta: Move invoke_generator to utils.cmake	2022-01-08 12:45:34 +01:00
Timothy Flynn	d5f14b5ff9	Meta: Move remove_unicode_data_if_version_changed to utils.cmake This function will be used by the time zone database parser. Move it to the common utilities file, and rename it remove_path_if_version_changed to be more generic.	2022-01-08 12:45:34 +01:00
Timothy Flynn	71903ea7e1	LibUnicode: Parse and generate calendar (ca) Unicode keywords Also removes a few fly-by "StringView x = nullptr;" unnecessary initializers.	2021-11-29 22:48:46 +00:00
Timothy Flynn	48ce72e472	LibUnicode: Parse and generate regional hour cycles Unlike most data in the CLDR, hour cycles are not stored on a per-locale basis. Instead, they are keyed by a string that is usually a region, but sometimes is a locale. Therefore, given a locale, to determine the hour cycles for that locale, we: 1. Check if the locale itself is assigned hour cycles. 2. If the locale has a region, check if that region is assigned hour cycles. 3. Otherwise, maximize that locale, and if the maximized locale has a region, check if that region is assigned hour cycles. 4. If the above all fail, fallback to the "001" region. Further, each locale's default hour cycle is the first assigned hour cycle.	2021-11-29 22:48:46 +00:00
Timothy Flynn	5c57341672	LibUnicode: Create a nearly empty generator for date-time formatting Similar to number formatting, the data for date-time formatting will be located in its own generated file. This extracts the cldr-dates package from the CLDR and sets up the generator plumbing to create the date-time data files.	2021-11-29 22:48:46 +00:00
Timothy Flynn	1539ed12f1	LibUnicode: Functionalize the Unicode generator CMake commands Makes it a bit easier to add a new generator.	2021-11-23 22:58:05 +01:00
Ben Wiederhake	b06b54772e	Meta+LibUnicode: Provide code point names through library	2021-11-20 00:31:55 +01:00
Timothy Flynn	4b535ce1c8	LibUnicode: Stop passing the cldr-core package to UnicodeNumberFormat This is no longer needed now that this generator isn't parsing the default-content locales.	2021-11-19 11:45:35 +01:00

1 2

65 commits