0ct0pu5/ladybird

Author	SHA1	Message	Date
demostanis	3e8b5ac920	AK+Everywhere: Turn bool keep_empty to an enum in split* functions	2022-10-24 23:29:18 +01:00
Timothy Flynn	f08a979b96	LibUnicode: Remove GCC codegen workaround Reverts commits: `ffbf5596cd` `f190e394b3`	2022-10-07 18:21:40 +01:00
Timothy Flynn	f38c68177b	LibUnicode: Update code point ideographic replacements for Unicode 15	2022-10-07 18:17:40 +01:00
Andreas Kling	f190e394b3	LibUnicode: Let's use the GCC 11/12 workaround on all platforms I seem to be getting some miscompiles on Linux as well, so let's make the hitherto macOS-specific workaround universal.	2022-10-06 17:15:28 +02:00
matcool	70d0c1616f	LibUnicode: Add decomposition mappings and Unicode normalization The mappings are exposed via `Unicode::code_point_decomposition(u32)` and `Unicode::code_point_decompositions()`, the latter being useful for reverse searching a code point from its decomposition. The normalization code does not make use of `Quick_Check` props (https://www.unicode.org/reports/tr44/#Decompositions_and_Normalization), meaning no quick check optimizations.	2022-10-06 08:24:39 -04:00
Nico Weber	2af028132a	AK+Everywhere: Add AK_COMPILER_{GCC,CLANG} and use them most places Doesn't use them in libc headers so that those don't have to pull in AK/Platform.h. AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is defined in clang builds as well.) Using AK_COMPILER_GCC simplifies things some. AK_COMPILER_CLANG isn't as much of a win, other than that it's consistent with AK_COMPILER_GCC.	2022-10-04 23:35:07 +01:00
Nico Weber	ffbf5596cd	Lagom: Work around gcc codegen bug Without this, GenerateUnicodeData crashes when run during the build. With this, `serenity.sh run` brings up a running SerenityOS. Since GenerateUnicodeData doesn't take a lot of time to run, just disable optimizations to work around the problem for now. Works around #15449.	2022-10-03 15:30:51 +01:00
Timothy Flynn	f082b6ae48	LibUnicode: Generate a separate Locale enumeration for special casing The UCD only cares about a few locales for special casing rules (az, lt, and tr). Unfortunately, LibUnicode cannot use LibLocale once the libraries are separate because LibLocale will need to use LibUnicode for many more things; thus there would be a circular dependency. Instead, just generate the small enum needed for this one use case.	2022-09-05 14:37:16 -04:00
Timothy Flynn	ff48220dca	Userland: Move files destined for LibLocale to the Locale namespace	2022-09-05 14:37:16 -04:00
Timothy Flynn	1e0276f541	LibLocale+LibUnicode: Move generated CLDR data files to LibLocale folder They are still included into LibUnicode, but this moves their generated location to be under LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	89d1813b5d	LibUnicode: Move CLDR data generators to a LibLocale subfolder To prepare for placing all CLDR generated data in a new library, LibLocale, this moves the code generators for the CLDR data to the LibLocale subfolder.	2022-09-05 14:37:16 -04:00
Timothy Flynn	ca92e37ae0	LibUnicode: Generate code point display names with run-length encoding Similar to commit `becec35`, our code point display name data was a large list of StringViews. RLE can be used here as well to remove about 32 MB from the initialized data section to the read-only section. Some of the refactoring to store strings as indices into an RLE array also lets us clean up some of the code point name generators.	2022-08-17 15:42:12 +01:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Sam Atkins	d564cf1e89	LibCore+Everywhere: Make Core::Stream read_line() return StringView Similar reasoning to making Core::Stream::read() return Bytes, except that every user of read_line() creates a StringView from the result, so let's just return one right away.	2022-04-16 13:27:51 -04:00
thankyouverycool	0505e031f1	Meta+LibUnicode: Download and parse Unicode block properties This parses Blocks.txt for CharacterType properties and creates a global display array for use in apps.	2022-02-15 10:13:19 -05:00
Timothy Flynn	a64a7940e4	LibUnicode: Port the UCD generator to the stream API	2022-02-14 11:39:46 -05:00
Idan Horowitz	2d50c08f34	LibUnicode: Download and parse {Grapheme,Word,Sentence} break props	2022-01-31 21:05:04 +02:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Timothy Flynn	701b7810ba	LibUnicode: Generate code point abbreviations	2022-01-18 15:13:25 +00:00
Timothy Flynn	437b9fe204	LibUnicode: Convert UnicodeData to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	cf8e11a562	LibUnicode: Add temporary overload of value-from-string generator This is a temporary mechanism while LibUnicode is in an in-between state where some symbols are weakly linked and others are dynamically loaded. The latter require an asm() label to be loaded.	2022-01-04 22:49:43 +00:00
Timothy Flynn	52394deece	LibUnicode: Remove now unused value-from-string generator overload The generate_value_from_string_for_dynamic_loading() overload was just temporary until all generates were switched over to dynamic loading.	2021-12-21 13:09:49 -08:00
Timothy Flynn	3fd53baa25	LibUnicode: Dynamically load the generated UnicodeData symbols The generated data for libunicodedata.so is quite large, and loading it is a price paid by nearly every application by way of depending on LibRegex. In order to defer this cost until an application actually uses one of the surrounding APIs, dynamically load the generated symbols. To be able to load the symbols dynamically, the generated methods must have demangled names. Typically, this is accomplished with `extern "C"` blocks. The clang toolchain complains about this here because the types returned from the generators are strictly C++ types. So to demangle the names, we use the asm() compiler directive to manually define a symbol name; the caveat is that we must be sure the symbols are unique. As an extra precaution, we prefix each symbol name with "unicode_". For more details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html This symbol loader used in this implementation provides the additional benefit of removing many [[maybe_unused]] attributes from the LibUnicode methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the loader is able to stub out the function pointers it returns. Note that as of this commit, LibUnicode is still directly linked against LibUnicodeData. This commit is just a first step towards removing that.	2021-12-21 13:09:49 -08:00
Timothy Flynn	7e6ad172a4	LibUnicode: Support code point names that apply to ranges of code points For example, consider the following adjacent entries in UnicodeData.txt: 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;; 4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;; Our current implementation would assign the display name "CJK Ideograph Extension A" to code points U+3400 & U+4DBF, but not to the code points in between. Not only should those code points be assigned a name, but the Unicode spec also has formatting rules on what the names should be (the names for these ranged code points are not as they appear in UnicodeData.txt). The spec also defines names for code point ranges that actually are listed individually in UnicodeData.txt. For example: 2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;; 2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;; 2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;; Code points are only coalesced into a range if all fields after the name are equivalent. Our parser will insert the range and its name formatting pattern when it comes across the first code point in that range, then ignore other code points in that range. This reduces the number of names we generated by nearly 2,000.	2021-11-30 11:24:02 +01:00
Timothy Flynn	f2f4980f15	LibUnicode: Remove unused field from UnicodeData generator	2021-11-30 11:24:02 +01:00
Timothy Flynn	88dbf3c348	LibUnicode: Port GenerateUnicodeData to ErrorOr and LibMain Also store command line arguments as StringViews rather than pointers.	2021-11-23 22:58:05 +01:00
Ben Wiederhake	b06b54772e	Meta+LibUnicode: Provide code point names through library	2021-11-20 00:31:55 +01:00
Timothy Flynn	9d1519e21c	LibUnicode: Move GenerateUnicodeData's Alias struct to generator header This will be used for locale aliases as well. Also rename the "property" field in this struct to "name", as it no longer is only used for property aliases.	2021-11-19 11:45:35 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Timothy Flynn	f91d63af83	LibUnicode: Generate enum/alias from-string methods without a HashMap The _from_string() and resolve__alias() generated methods are the last remaining users of HashMap in the LibUnicode generated files (read: the last methods not using compile-time structures). This converts these methods to use an array containing pairs of hash values to the desired lookup value. Because this code generation is the same between GenerateUnicodeData.cpp and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the LibUnicode generators to contain the method that generates the methods.	2021-10-13 16:38:51 +02:00
Timothy Flynn	79707d83d3	LibUnicode: Stop generating large UnicodeData hash map The data in this hash map is now available by way of much smaller arrays and is now unused.	2021-10-10 13:49:37 +02:00
Timothy Flynn	d83b262e64	LibUnicode: Generate standalone compile-time array for combining class	2021-10-10 13:49:37 +02:00
Timothy Flynn	9f83774913	LibUnicode: Generate standalone compile-time array for special casing There are only 112 code points with special casing rules, so this array is quite small (compared to the size 34,626 UnicodeData hash map that is also storing this data). Removing all casing rules from UnicodeData will happen in a subsequent commit.	2021-10-10 13:49:37 +02:00
Timothy Flynn	da4b8897a7	LibUnicode: Generate standalone compile-time arrays for simple casing Currently, all casing information (simple and special) are stored in a compile-time array of size 34,626, then statically copied to a hash map at runtime. In an effort to reduce the resulting memory usage, store the simple casing rules in standalone compile-time arrays. The uppercase map is size 1,450 and the lowercase map is size 1,433. Any code point not in a map will implicitly have an identity mapping.	2021-10-10 13:49:37 +02:00
Nico Weber	9ec9886b04	Meta: Fix typos	2021-10-01 01:06:40 +01:00
Timothy Flynn	c8dbcdb0bc	LibUnicode: Do not compare generated file contents before writing This is now covered by unicode_data.cmake after the superbuild changes.	2021-09-30 17:37:57 +01:00
Idan Horowitz	6704961c82	AK: Replace the mutable String::replace API with an immutable version This removes the awkward String::replace API which was the only String API which mutated the String and replaces it with a new immutable version that returns a new String with the replacements applied. This also fixes a couple of UAFs that were caused by the use of this API. As an optimization an equivalent StringView::replace API was also added to remove an unnecessary String allocations in the format of: `String { view }.replace(...);`	2021-09-11 20:36:43 +03:00
Timothy Flynn	077a693de6	LibUnicode: Sort special casing array by locale specificity This is to simply the Default Case Conversion implementation. Otherwise, the implementation would need to determine which special casing rule to apply, instead of just picking the first match.	2021-09-06 15:24:27 +01:00
Timothy Flynn	91db61ae8d	LibUnicode: Generate canonical combining class in Unicode data Will be used by special casing rules.	2021-09-06 15:24:27 +01:00
Andrew Kaster	63956b36d0	Everywhere: Move all host tools into the Lagom/Tools subdirectory This allows us to remove all the add_subdirectory calls from the top level CMakeLists.txt that referred to targets linking LagomCore. Segregating the host tools and Serenity targets helps us get to a place where the main Serenity build can simply use a CMake toolchain file rather than swapping all the compiler/sysroot variables after building host libraries and tools.	2021-08-28 08:44:17 +01:00

41 commits