Commit graph

6 commits

Author SHA1 Message Date
Timothy Flynn
dff156b7c6 LibUnicode: Reduce Unicode data generator boilerplate
There's a fair amount of boilerplate when e.g. adding a new UCD file to
parse or a new enumeration to generate. Reduce the overhead by adding
helper lambdas. Also adds a couple missing spec links with UCD field
information.
2021-07-28 23:42:29 +02:00
Timothy Flynn
12fb3ae033 LibUnicode: Download and parse the word break property list UCD file
Note that unlike the main property list, each code point has only one
word break property. Code points that do not have a word break property
are to be assigned the property "Other".
2021-07-28 23:42:29 +02:00
Timothy Flynn
38adfd8874 LibUnicode: Download and parse the property list UCD file 2021-07-28 23:42:29 +02:00
Timothy Flynn
5b110034dd LibUnicode: Produce each code point's general category
This will be needed for the Unicode Standard's Default Case Algorithm.
Generate the field as an enumeration rather than a string for easier
comparison.
2021-07-27 21:04:36 +01:00
Timothy Flynn
32ea461385 LibUnicode: Download and parse the special casing UCD file
This adds a SpecialCasing structure to the generated UnicodeData.h/cpp
files. This structure contains casing rules for code points which have
non-1-to-1 upper-to-lower case code point mappings. Further, these rules
may be limited to specific locales or other context.
2021-07-27 21:04:36 +01:00
Timothy Flynn
4dda3edc9e LibUnicode: Introduce a Unicode library for interacting with UCD files
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.

As a start, LibUnicode includes upper- and lower-case code point
converters.
2021-07-26 17:03:55 +01:00