0ct0pu5/ladybird

Author	SHA1	Message	Date
Timothy Flynn	7e6ad172a4	LibUnicode: Support code point names that apply to ranges of code points For example, consider the following adjacent entries in UnicodeData.txt: 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;; 4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;; Our current implementation would assign the display name "CJK Ideograph Extension A" to code points U+3400 & U+4DBF, but not to the code points in between. Not only should those code points be assigned a name, but the Unicode spec also has formatting rules on what the names should be (the names for these ranged code points are not as they appear in UnicodeData.txt). The spec also defines names for code point ranges that actually are listed individually in UnicodeData.txt. For example: 2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;; 2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;; 2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;; Code points are only coalesced into a range if all fields after the name are equivalent. Our parser will insert the range and its name formatting pattern when it comes across the first code point in that range, then ignore other code points in that range. This reduces the number of names we generated by nearly 2,000.	2021-11-30 11:24:02 +01:00
Timothy Flynn	f2f4980f15	LibUnicode: Remove unused field from UnicodeData generator	2021-11-30 11:24:02 +01:00
Timothy Flynn	88dbf3c348	LibUnicode: Port GenerateUnicodeData to ErrorOr and LibMain Also store command line arguments as StringViews rather than pointers.	2021-11-23 22:58:05 +01:00
Ben Wiederhake	b06b54772e	Meta+LibUnicode: Provide code point names through library	2021-11-20 00:31:55 +01:00
Timothy Flynn	9d1519e21c	LibUnicode: Move GenerateUnicodeData's Alias struct to generator header This will be used for locale aliases as well. Also rename the "property" field in this struct to "name", as it no longer is only used for property aliases.	2021-11-19 11:45:35 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Timothy Flynn	f91d63af83	LibUnicode: Generate enum/alias from-string methods without a HashMap The _from_string() and resolve__alias() generated methods are the last remaining users of HashMap in the LibUnicode generated files (read: the last methods not using compile-time structures). This converts these methods to use an array containing pairs of hash values to the desired lookup value. Because this code generation is the same between GenerateUnicodeData.cpp and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the LibUnicode generators to contain the method that generates the methods.	2021-10-13 16:38:51 +02:00
Timothy Flynn	79707d83d3	LibUnicode: Stop generating large UnicodeData hash map The data in this hash map is now available by way of much smaller arrays and is now unused.	2021-10-10 13:49:37 +02:00
Timothy Flynn	d83b262e64	LibUnicode: Generate standalone compile-time array for combining class	2021-10-10 13:49:37 +02:00
Timothy Flynn	9f83774913	LibUnicode: Generate standalone compile-time array for special casing There are only 112 code points with special casing rules, so this array is quite small (compared to the size 34,626 UnicodeData hash map that is also storing this data). Removing all casing rules from UnicodeData will happen in a subsequent commit.	2021-10-10 13:49:37 +02:00
Timothy Flynn	da4b8897a7	LibUnicode: Generate standalone compile-time arrays for simple casing Currently, all casing information (simple and special) are stored in a compile-time array of size 34,626, then statically copied to a hash map at runtime. In an effort to reduce the resulting memory usage, store the simple casing rules in standalone compile-time arrays. The uppercase map is size 1,450 and the lowercase map is size 1,433. Any code point not in a map will implicitly have an identity mapping.	2021-10-10 13:49:37 +02:00
Nico Weber	9ec9886b04	Meta: Fix typos	2021-10-01 01:06:40 +01:00
Timothy Flynn	c8dbcdb0bc	LibUnicode: Do not compare generated file contents before writing This is now covered by unicode_data.cmake after the superbuild changes.	2021-09-30 17:37:57 +01:00
Idan Horowitz	6704961c82	AK: Replace the mutable String::replace API with an immutable version This removes the awkward String::replace API which was the only String API which mutated the String and replaces it with a new immutable version that returns a new String with the replacements applied. This also fixes a couple of UAFs that were caused by the use of this API. As an optimization an equivalent StringView::replace API was also added to remove an unnecessary String allocations in the format of: `String { view }.replace(...);`	2021-09-11 20:36:43 +03:00
Timothy Flynn	077a693de6	LibUnicode: Sort special casing array by locale specificity This is to simply the Default Case Conversion implementation. Otherwise, the implementation would need to determine which special casing rule to apply, instead of just picking the first match.	2021-09-06 15:24:27 +01:00
Timothy Flynn	91db61ae8d	LibUnicode: Generate canonical combining class in Unicode data Will be used by special casing rules.	2021-09-06 15:24:27 +01:00
Andrew Kaster	63956b36d0	Everywhere: Move all host tools into the Lagom/Tools subdirectory This allows us to remove all the add_subdirectory calls from the top level CMakeLists.txt that referred to targets linking LagomCore. Segregating the host tools and Serenity targets helps us get to a place where the main Serenity build can simply use a CMake toolchain file rather than swapping all the compiler/sysroot variables after building host libraries and tools.	2021-08-28 08:44:17 +01:00

17 commits