beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-22 15:40:19 +00:00

Author	SHA1	Message	Date
Diego	7560b640f3	AK: Add `AllowSurrogates` to UTF-8 validator The [UTF-8](https://datatracker.ietf.org/doc/html/rfc3629#page-5) standard says to reject strings with upper or lower surrogates. However, in many standards, ECMAScript included, unpaired surrogates (and therefore UTF-8 surrogates) are allowed in strings. So, this commit extends the UTF-8 validation API with `AllowSurrogates`, which will reject upper and lower surrogate characters.	2024-06-09 12:16:32 +02:00
Timothy Flynn	1b4a23095c	AK: Add a Utf16View::starts_with method Based heavily on Utf8View::starts_with.	2024-01-04 12:43:10 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Nico Weber	aa9037eed4	AK: Add spec comments to Utf16CodePointIterator::operator*()	2023-01-22 21:30:44 +00:00
Timothy Flynn	2eacc7aec1	AK: Add Utf16View::to_utf8 to convert the view to a UTF-8 AK::String	2023-01-09 23:00:24 +00:00
Timothy Flynn	d0403ec14f	AK+Everywhere: Rename Utf16View::to_utf8 to to_deprecated_string A subsequent commit will add to_utf8 back to create an AK::String.	2023-01-09 23:00:24 +00:00
Timothy Flynn	d793262beb	AK+Everywhere: Make UTF-16 to UTF-8 converter fallible This could fail to allocate the underlying storage needed to store the UTF-8 data. Propagate this error.	2023-01-08 12:13:15 +01:00
Timothy Flynn	1edb96376b	AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible These could fail to allocate the underlying storage needed to store the UTF-16 data. Propagate these errors.	2023-01-08 12:13:15 +01:00
Timothy Flynn	425c168ded	AK+LibJS+LibRegex: Define an alias for UTF-16 string data storage Instead of writing out "Vector<u16, 1>" everywhere, let's have a name for it.	2023-01-08 12:13:15 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Linus Groh	d26aabff04	Everywhere: Run clang-format	2022-12-03 23:52:23 +00:00
Idan Horowitz	44e8c05c67	AK: Add a Utf16View::code_unit_offset_of(Utf16CodePointIterator) helper This helper can be used to used to retrieve the code unit offset of an active Utf16CodePointIterator efficiently.	2022-01-31 21:05:04 +02:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Andreas Kling	87290e300e	AK: Simplify Utf16View::operator==(Utf16View)	2021-10-02 18:32:56 +02:00
Andreas Kling	024367d82e	LibJS+AK: Use Vector<u16, 1> for UTF-16 string storage It's very common to encounter single-character strings in JavaScript on the web. We can make such strings significantly lighter by having a 1-character inline capacity on the Vectors.	2021-10-02 17:39:38 +02:00
Timothy Flynn	70080feab2	AK+LibJS: Implement String.from{CharCode,CodePoint} using UTF-16 strings Most of String.prototype and RegExp.prototype is implemented with UTF-16 so this is to prevent extra copying of the string data.	2021-08-04 11:18:24 +02:00
Timothy Flynn	510bbcd8e0	AK+LibRegex: Add Utf16View::code_point_at and use it in RegexStringView The current method of iterating through the string to access a code point hurts performance quite badly for very large strings. The test262 test "RegExp/property-escapes/generated/Any.js" previously took 3 hours to complete; this one change brings it down to under 10 seconds.	2021-08-04 11:18:24 +02:00
Timothy Flynn	0e6375558d	AK+LibRegex: Partially implement case insensitive UTF-16 comparison This will work for ASCII code points. Unicode case folding will be needed for non-ASCII.	2021-07-23 23:06:57 +01:00
Timothy Flynn	2e45e52993	AK: Add UTF-16 helper methods required for use within LibRegex To be used as a RegexStringView variant, Utf16View must provide a couple more helper methods. It must also not default its assignment operators, because that implicitly deletes move/copy constructors.	2021-07-23 23:06:57 +01:00
Timothy Flynn	9b83cd1abf	AK: Add Utf16View for decoding UTF-16 strings Also includes a way to transcode from and to UTF-8 strings.	2021-07-22 09:10:44 +02:00

21 commits