Adds methods to retrieve a Unicode property from a string and to check
if a code point matches a Unicode property.
Also adds a <LibUnicode/Forward.h> header.
DerivedCoreProperties are pseudo-properties that are the union of other
categories and properties. For example, the derived property Math is the
union of the general category Sm and the property Other_Math.
Parsing these is necessary for implementing Unicode property escapes.
But it also has the added benefit that LibUnicode now does not need to
derive some of these properties at runtime.
Originally, this was done to make the generated enums look more like the
rest of Serenity's enums. But for Unicode property escapes, LibUnicode
will need to compare property names from a RegExp.prototype object to
these parsed property names, which will be easier without this
modification.
Rather than generating the PropList as a list of enums, generate it as
a bitmask. Not only will this be better for runtime property searching,
this will allow parsing of the DerivedCoreProperties list more easily.
There's a fair amount of boilerplate when e.g. adding a new UCD file to
parse or a new enumeration to generate. Reduce the overhead by adding
helper lambdas. Also adds a couple missing spec links with UCD field
information.
Note that unlike the main property list, each code point has only one
word break property. Code points that do not have a word break property
are to be assigned the property "Other".
This will be needed for the Unicode Standard's Default Case Algorithm.
Generate the field as an enumeration rather than a string for easier
comparison.
This adds a SpecialCasing structure to the generated UnicodeData.h/cpp
files. This structure contains casing rules for code points which have
non-1-to-1 upper-to-lower case code point mappings. Further, these rules
may be limited to specific locales or other context.
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.
As a start, LibUnicode includes upper- and lower-case code point
converters.