Commit graph

30 commits

Author SHA1 Message Date
Jonne Ransijn
04920d06f0 AK: Use simdutf when appending UTF-16 to StringBuilder
Adds a fast path for valid UTF-16 using `simdutf`, and fall back to
the slow path for unmatched surrogates.
2024-10-30 10:28:24 +01:00
Timothy Flynn
7a17c654d2 AK: Add a method to compute UTF-16 length from a UTF-8 string 2024-07-31 05:55:34 -04:00
Timothy Flynn
71c29504af AK: Support non-native endianness in Utf16View
Utf16View currently assumes host endianness. Add support for specifying
either big or little endianness (which we mostly just pipe through to
simdutf). This will allow using simdutf facilities with LibTextCodec.
2024-07-18 19:43:57 +02:00
Timothy Flynn
32ffe9bbfc AK: Replace UTF-16 validation and length computation with simdutf 2024-07-18 14:46:25 +02:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
Timothy Flynn
1b4a23095c AK: Add a Utf16View::starts_with method
Based heavily on Utf8View::starts_with.
2024-01-04 12:43:10 +01:00
Timothy Flynn
c46ba7e68d AK: Allow constructing a UTF-16 view from a UTF-16 string literal
UTF-16 string literals are a language-level feature. It is convenient to
be able to construct a Utf16View from these strings.
2024-01-04 12:43:10 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Timothy Flynn
370ea9441c AK: Define an alias for Utf16View's iterator type
Utf8View and Utf32View do so already. This allows using these views more
readily in generic code.
2023-11-08 12:54:26 -05:00
MacDue
63b11030f0 Everywhere: Use ReadonlySpan<T> instead of Span<T const> 2023-02-08 19:15:45 +00:00
Timothy Flynn
2eacc7aec1 AK: Add Utf16View::to_utf8 to convert the view to a UTF-8 AK::String 2023-01-09 23:00:24 +00:00
Timothy Flynn
d0403ec14f AK+Everywhere: Rename Utf16View::to_utf8 to to_deprecated_string
A subsequent commit will add to_utf8 back to create an AK::String.
2023-01-09 23:00:24 +00:00
Timothy Flynn
d793262beb AK+Everywhere: Make UTF-16 to UTF-8 converter fallible
This could fail to allocate the underlying storage needed to store the
UTF-8 data. Propagate this error.
2023-01-08 12:13:15 +01:00
Timothy Flynn
1edb96376b AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible
These could fail to allocate the underlying storage needed to store the
UTF-16 data. Propagate these errors.
2023-01-08 12:13:15 +01:00
Timothy Flynn
425c168ded AK+LibJS+LibRegex: Define an alias for UTF-16 string data storage
Instead of writing out "Vector<u16, 1>" everywhere, let's have a name
for it.
2023-01-08 12:13:15 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Daniel Bertalan
4296425bd8 Everywhere: Remove redundant inequality comparison operators
C++20 can automatically synthesize `operator!=` from `operator==`, so
there is no point in writing such functions by hand if all they do is
call through to `operator==`.

This fixes a compile error with compilers that implement P2468 (Clang
16 currently). This paper restores the C++17 behavior that if both
`T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be
rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators
makes the rewriting possible again.
See https://reviews.llvm.org/D134529#3853062
2022-11-06 10:25:08 -07:00
Idan Horowitz
44e8c05c67 AK: Add a Utf16View::code_unit_offset_of(Utf16CodePointIterator) helper
This helper can be used to used to retrieve the code unit offset of an
active Utf16CodePointIterator efficiently.
2022-01-31 21:05:04 +02:00
Timothy Flynn
6efbafa6e0 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2022-01-31 18:23:22 +00:00
Andreas Kling
216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
87290e300e AK: Simplify Utf16View::operator==(Utf16View) 2021-10-02 18:32:56 +02:00
Andreas Kling
024367d82e LibJS+AK: Use Vector<u16, 1> for UTF-16 string storage
It's very common to encounter single-character strings in JavaScript on
the web. We can make such strings significantly lighter by having a
1-character inline capacity on the Vectors.
2021-10-02 17:39:38 +02:00
Timothy Flynn
daf559c717 AK: Add a formatter overload for Utf16View 2021-08-10 23:07:50 +02:00
Timothy Flynn
70080feab2 AK+LibJS: Implement String.from{CharCode,CodePoint} using UTF-16 strings
Most of String.prototype and RegExp.prototype is implemented with UTF-16
so this is to prevent extra copying of the string data.
2021-08-04 11:18:24 +02:00
Timothy Flynn
510bbcd8e0 AK+LibRegex: Add Utf16View::code_point_at and use it in RegexStringView
The current method of iterating through the string to access a code
point hurts performance quite badly for very large strings. The test262
test "RegExp/property-escapes/generated/Any.js" previously took 3 hours
to complete; this one change brings it down to under 10 seconds.
2021-08-04 11:18:24 +02:00
Timothy Flynn
0e6375558d AK+LibRegex: Partially implement case insensitive UTF-16 comparison
This will work for ASCII code points. Unicode case folding will be
needed for non-ASCII.
2021-07-23 23:06:57 +01:00
Timothy Flynn
2e45e52993 AK: Add UTF-16 helper methods required for use within LibRegex
To be used as a RegexStringView variant, Utf16View must provide a couple
more helper methods. It must also not default its assignment operators,
because that implicitly deletes move/copy constructors.
2021-07-23 23:06:57 +01:00
Timothy Flynn
9b83cd1abf AK: Add Utf16View for decoding UTF-16 strings
Also includes a way to transcode from and to UTF-8 strings.
2021-07-22 09:10:44 +02:00