beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-25 09:00:22 +00:00

Author	SHA1	Message	Date
Dennis Camera	b54a1c6284	AK: Implement ShortString for big-endian	2024-07-05 09:49:23 -06:00
Timothy Flynn	5cf818e305	LibUnicode: Replace case transformations and comparison with ICUs There are a couple of differences here due to using ICU: 1. Titlecasing behaves slightly differently. We previously transformed "123dollars" to "123Dollars", as we would use word segmentation to split a string into words, then transform the first cased character to titlecase. ICU doesn't go quite that far, and leaves the string as "123dollars". While this is a behavior change, the only user of this API is the `text-transform: capitalize;` CSS rule, and we now match the behavior of other browsers. 2. There isn't an API to compare strings with case insensitivity without allocating case-folded strings for both the left- and right-hand-side strings. Our implementation was previously allocation-free; however, in a benchmark, ICU is still ~1.4x faster.	2024-06-20 10:59:55 +02:00
Timothy Flynn	fe3fde2411	AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset The existing String::find_byte_offset is case-sensitive. This variant allows performing searches using Unicode-aware case folding.	2024-06-01 07:37:54 +02:00
Shannon Booth	d777b279e3	LibUnicode+Tests: Remove now unused `to_unicode_*_full` methods Relocating all of the tests for these in LibUnicode over to the AK String testsuite.	2023-11-28 17:15:27 -05:00
Timothy Flynn	6aa334767f	AK: Ensure assigned-to Strings are dereferenced if needed If we assign to an existing non-short string, we must dereference its StringData object to prevent leaking that data.	2023-11-28 16:38:18 +01:00
Lucas CHOLLET	fde26c53f0	AK: Remove the API to explicitly construct short strings Now that ""_string is infallible, the only benefit of explicitly constructing a short string is the ability to do it at compile-time. But we never do that, so let's simplify the API and remove this implementation detail from it.	2023-08-08 07:37:21 +02:00
Lucas CHOLLET	3f35ffb648	Userland: Prefer `_string` over `_short_string` As `_string` can't fail anymore (since `3434412`), there are no real benefits to use the short variant in most cases.	2023-08-08 07:37:21 +02:00
Andreas Kling	34344120f2	AK: Make "foo"_string infallible Stop worrying about tiny OOMs. Work towards #20405.	2023-08-07 16:03:27 +02:00
Tim Schumacher	ae51c1821c	Everywhere: Remove unintentional partial stream reads and writes	2023-03-13 15:16:20 +00:00
Tim Schumacher	d5871f5717	AK: Rename Stream::{read,write} to Stream::{read_some,write_some} Similar to POSIX read, the basic read and write functions of AK::Stream do not have a lower limit of how much data they read or write (apart from "none at all"). Rename the functions to "read some [data]" and "write some [data]" (with "data" being omitted, since everything here is reading and writing data) to make them sufficiently distinct from the functions that ensure to use the entire buffer (which should be the go-to function for most usages). No functional changes, just a lot of new FIXMEs.	2023-03-13 15:16:20 +00:00
Timothy Flynn	1393ed2000	AK+LibUnicode: Implement String::equals_ignoring_case without allocating We currently fully casefold the left- and right-hand sides to compare two strings with case-insensitivity. Now, we casefold one code point at a time, storing the result in a view for comparison, until we exhaust both strings.	2023-03-08 18:57:53 +00:00
Timothy Flynn	515fca4f7a	AK: Make String::contains(code_point) handle non-ASCII We currently only accept a char, instead of a full code point.	2023-03-08 14:16:47 +00:00
Timothy Flynn	f882581e91	AK: Make String::{starts,ends}_with(code_point) handle non-ASCII We currently pass the code point to StringView::{starts,ends}_with, which actually accepts a single char, thus cannot handle non-ASCII code points.	2023-03-08 14:16:47 +00:00
Timothy Flynn	da0d000909	AK: Ensure short String instances are valid UTF-8 We are currently only validating long strings.	2023-03-03 11:46:42 -05:00
Linus Groh	09d40bfbb2	Everywhere: Use _{short_,}string to create Strings from literals	2023-02-25 20:51:49 +01:00
Andrew Kaster	0ea697ace5	AK: Add String::from_stream method The caller is responsible for determining how long the string is that they want to read.	2023-02-21 10:57:44 +01:00
Andreas Kling	d0697d350d	AK: Fix 64-bit alignment issue in shared-superstring substrings Thanks to Timothy Flynn for the test! Fixes #17141	2023-02-18 09:12:46 -05:00
Timothy Flynn	5cbf054651	LibUnicode: Fix typos causing text segmentation on mid-word punctuation For example the words "can't" and "32.3" should not have boundaries detected on the "'" and "." code points, respectively. The String test cases fixed here are because "b'ar" is now considered one word.	2023-02-15 12:36:47 +01:00
Timothy Flynn	c59268d15b	AK: Add String::trim	2023-01-28 00:13:46 +00:00
Timothy Flynn	cccaa94767	AK: Add String::join	2023-01-28 00:13:46 +00:00
Timothy Flynn	c35b1371a3	AK: Add an overload of String::find_byte_offset for StringView	2023-01-27 18:00:17 +00:00
Timothy Flynn	427b82065c	AK: Add a method to create a String with a repeated code point	2023-01-24 16:23:50 -05:00
Timothy Flynn	d50724956e	AK: Add a method to find the byte offset of a code point	2023-01-24 16:23:50 -05:00
Timothy Flynn	12c8bc3e85	AK: Add a String factory to create a string from a single code point	2023-01-22 01:03:13 +00:00
martinfalisse	aec2dadfdd	AK: Add `split()` for `String`	2023-01-21 14:35:00 +01:00
Timothy Flynn	d48266a420	AK: Support creating known short string literals at compile time In cases where we know a string literal will fit in the short string storage, we can do so at compile time without needing to handle error propagation. If the provided string literal is too long, a compilation error will be emitted due to the failed VERIFY statement being a non- constant expression.	2023-01-20 14:24:12 -05:00
Timothy Flynn	537fcaf59e	AK+LibUnicode: Provide Unicode-aware caseless String matching The Unicode spec defines much more complicated caseless matching algorithms in its Collation spec. This implements the "basic" case folding comparison.	2023-01-18 14:43:40 +00:00
Timothy Flynn	d6ddca0c0f	AK+LibUnicode: Provide Unicode-aware String titlecase transformation	2023-01-16 18:33:44 -05:00
Timothy Flynn	bd9b65e82f	AK: Add String::is_one_of for variadic string comparison	2023-01-15 01:00:20 +00:00
Timothy Flynn	9db9b2f9be	AK: Add a somewhat naive implementation of String::reverse This will reverse the String's code points (i.e. not just its bytes), but is not aware of grapheme clusters.	2023-01-15 01:00:20 +00:00
Timothy Flynn	6fcc1c7426	AK+LibUnicode: Provide Unicode-aware String case transformations Since AK can't refer to LibUnicode directly, the strategy here is that if you need case transformations, you can link LibUnicode and receive them. If you try to use either of these methods without linking it, then you'll of course get a linker error (note we don't do any fallbacks to e.g. ASCII case transformations). If you don't need these methods, you don't have to link LibUnicode.	2023-01-09 19:23:46 -07:00
Maciej	58f5deba70	AK: Unref old m_data in String's move assignment We were overridding the data pointer without unreffing it, causing a memory leak when assigning a String.	2022-12-09 00:02:53 +01:00
Andreas Kling	a3e82eaad3	AK: Introduce the new String, replacement for DeprecatedString DeprecatedString (formerly String) has been with us since the start, and it has served us well. However, it has a number of shortcomings that I'd like to address. Some of these issues are hard if not impossible to solve incrementally inside of DeprecatedString, so instead of doing that, let's build a new String class and then incrementally move over to it instead. Problems in DeprecatedString: - It assumes string allocation never fails. This makes it impossible to use in allocation-sensitive contexts, and is the reason we had to ban DeprecatedString from the kernel entirely. - The awkward null state. DeprecatedString can be null. It's different from the empty state, although null strings are considered empty. All code is immediately nicer when using Optional<DeprecatedString> but DeprecatedString came before Optional, which is how we ended up like this. - The encoding of the underlying data is ambiguous. For the most part, we use it as if it's always UTF-8, but there have been cases where we pass around strings in other encodings (e.g ISO8859-1) - operator[] and length() are used to iterate over DeprecatedString one byte at a time. This is done all over the codebase, and will not give the right results unless the string is all ASCII. How we solve these issues in the new String: - Functions that may allocate now return ErrorOr<String> so that ENOMEM errors can be passed to the caller. - String has no null state. Use Optional<String> when needed. - String is always UTF-8. This is validated when constructing a String. We may need to add a bypass for this in the future, for cases where you have a known-good string, but for now: validate all the things! - There is no operator[] or length(). You can get the underlying data with bytes(), but for iterating over code points, you should be using an UTF-8 iterator. Furthermore, it has two nifty new features: - String implements a small string optimization (SSO) for strings that can fit entirely within a pointer. This means up to 3 bytes on 32-bit platforms, and 7 bytes on 64-bit platforms. Such small strings will not be heap-allocated. - String can create substrings without making a deep copy of the substring. Instead, the superstring gets +1 refcount from the substring, and it acts like a view into the superstring. To make substrings like this, use the substring_with_shared_superstring() API. One caveat: - String does not guarantee that the underlying data is null-terminated like DeprecatedString does today. While this was nifty in a handful of places where we were calling C functions, it did stand in the way of shared-superstring substrings.	2022-12-06 15:21:26 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
demostanis	3e8b5ac920	AK+Everywhere: Turn bool keep_empty to an enum in split* functions	2022-10-24 23:29:18 +01:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Daniel Bertalan	e15d6125b2	Tests: Move sprintf test from AK/ to LibC/ This test doesn't test AK::String, but LibC's sprintf instead, so it does not belong in `Tests/AK`. This also means this test won't be ran on Lagom using the host OS's printf implementation. Fixes a deprecated declaration warning when compiling with macOS SDK 13.	2022-07-04 21:46:02 +02:00
Daniel Bertalan	8473f6caee	AK+Tests: Make null strings compare less than non-null strings This behavior regressed in `ca58c71faa`. Fixes #12213	2022-01-30 17:23:02 +00:00
Andreas Kling	79ee846f3d	AK: Disable the empty-string-vs-null-string test until we have a fix	2022-01-30 16:21:59 +01:00
networkException	1921a166e5	Tests: Add test for null string and empty string to be unequal See #12213	2022-01-30 15:24:35 +01:00
Matt Jacobson	47e8d58553	AK: Fix logic in String::operator>(const String&) Null strings should not compare greater than non-null strings. Add tests for >, <, >=, and <= comparison involving null strings.	2022-01-16 11:08:23 +01:00
Andreas Kling	5f7d008791	AK+Everywhere: Stop including Vector.h from StringView.h Preparation for using Error.h from Vector.h. This required moving some things out of line.	2021-11-10 21:58:58 +01:00
Idan Horowitz	6704961c82	AK: Replace the mutable String::replace API with an immutable version This removes the awkward String::replace API which was the only String API which mutated the String and replaces it with a new immutable version that returns a new String with the replacements applied. This also fixes a couple of UAFs that were caused by the use of this API. As an optimization an equivalent StringView::replace API was also added to remove an unnecessary String allocations in the format of: `String { view }.replace(...);`	2021-09-11 20:36:43 +03:00
Mandar Kulkarni	aaf232f903	Tests: Add test for String::bijective_base_from()	2021-08-09 14:14:07 +04:30
Tobias Christiansen	f35c25a7eb	Tests: Add test for String::roman_number_from()	2021-07-04 22:17:03 +02:00
Max Wipfli	4b87dd5c5c	Tests: Add test for String::find with empty needle This adds a test case for String::find and String::find_all with empty needles. The expected behavior is in line with what the C++ standard library (and other languages standard libraries) expect.	2021-07-02 21:54:21 +02:00
Andreas Kling	de395a3df2	AK+Everywhere: Consolidate String::index_of() and String::find() We had two functions for doing mostly the same thing. Combine both of them into String::find() and use that everywhere. Also add some tests to cover basic behavior.	2021-05-24 11:59:18 +02:00
Maciej Zygmanowski	80077cea86	AK: Add String::find_all() and String::count()	2021-05-19 20:51:51 +01:00
Brian Gianforcaro	67322b0702	Tests: Move AK tests to Tests/AK	2021-05-06 17:54:28 +02:00

50 commits