There's a fair amount of boilerplate when e.g. adding a new UCD file to
parse or a new enumeration to generate. Reduce the overhead by adding
helper lambdas. Also adds a couple missing spec links with UCD field
information.
Note that unlike the main property list, each code point has only one
word break property. Code points that do not have a word break property
are to be assigned the property "Other".
This will be needed for the Unicode Standard's Default Case Algorithm.
Generate the field as an enumeration rather than a string for easier
comparison.
This adds a SpecialCasing structure to the generated UnicodeData.h/cpp
files. This structure contains casing rules for code points which have
non-1-to-1 upper-to-lower case code point mappings. Further, these rules
may be limited to specific locales or other context.
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.
As a start, LibUnicode includes upper- and lower-case code point
converters.