Commit graph

192 commits

Author SHA1 Message Date
Andreas Kling
073bcfd386 AK+LibWeb: Add {Fly,}String::to_ascii_{upper,lower}_case()
These don't have to worry about the input not being valid UTF-8 and
so can be infallible (and can even return self if no changes needed.)

We use this instead of Infra::to_ascii_{upper,lower}_case in LibWeb.
2024-10-14 20:47:35 +02:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere
Some checks are pending
CI / Lagom (false, FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, macos-14, macOS, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (true, NO_FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Push notes / build (push) Waiting to run
2024-10-04 13:19:50 +02:00
Shannon Booth
b3bf5c4ea8 AK: Add BOM handling to String::from_utf8_with_replacement_character 2024-08-12 06:38:58 -04:00
Shannon Booth
1e8cc97b73 AK: Add fast-path in from_utf8_with_replacement_character for utf-8
This ports the same optimization which was made in
1a46d8df5f to this function as well.
2024-08-12 06:38:58 -04:00
Shannon Booth
033ea0e7fb AK: Add String::from_utf8_with_replacement_character
This takes a byte sequence and converts it to a UTF-8 string with the
replacement character.
2024-08-10 10:39:43 +02:00
Andrew Kaster
45301e8169 Everywhere: Remove AK_DONT_REPLACE_STD macro
Let's just always include `<utility>`. Placing our own incompatible with
the STL declaration of these functions in AK was always fishy to begin
with.
2024-07-30 18:38:02 -06:00
Timothy Flynn
74d644a216 AK: Explicitly check for null data in Utf16View
The underlying CPU-specific instructions for operating on UTF-16 strings
behave differently for null inputs. Add an explicit check for this state
for consistency.
2024-07-21 19:57:07 +02:00
Timothy Flynn
29879a69a4 AK: Construct Strings from StringBuilder without re-allocating the data
Currently, invoking StringBuilder::to_string will re-allocate the string
data to construct the String. This is wasteful both in terms of memory
and speed.

The goal here is to simply hand the string buffer over to String, and
let String take ownership of that buffer. To do this, StringBuilder must
have the same memory layout as Detail::StringData. This layout is just
the members of the StringData class followed by the string itself.

So when a StringBuilder is created, we reserve sizeof(StringData) bytes
at the front of the buffer. StringData can then construct itself into
the buffer with placement new.

Things to note:
* StringData must now be aware of the actual capacity of its buffer, as
  that can be larger than the string size.
* We must take care not to pass ownership of inlined string buffers, as
  these live on the stack.
2024-07-20 06:45:49 +02:00
Timothy Flynn
71c29504af AK: Support non-native endianness in Utf16View
Utf16View currently assumes host endianness. Add support for specifying
either big or little endianness (which we mostly just pipe through to
simdutf). This will allow using simdutf facilities with LibTextCodec.
2024-07-18 19:43:57 +02:00
Timothy Flynn
0c14a9417a AK: Replace converting to and from UTF-16 with simdutf
The one behavior difference is that we will now actually fail on invalid
code units with Utf16View::to_utf8(AllowInvalidCodeUnits::No). It was
arguably a bug that this wasn't already the case.
2024-07-18 14:46:25 +02:00
Andreas Kling
ebe6ec6069 AK: Check for u32 overflow in String::repeated()
I don't know why this was checking for size_t overflow, but it was
tripping up ASAN malloc() checks by passing a way-too-large size.
2024-05-07 09:15:40 +02:00
Jess
ecb7d4b40f LibJS: Throw RangeError in StringPrototype::repeat if OOM
currently crashes with an assertion failure in `String::repeated` if
malloc can't serve a `count * input_size` sized request, so add
`String::repeated_with_error` to propagate the error.
2024-04-20 19:23:46 -04:00
Timothy Flynn
de80f544d8 AK: Disallow calling String methods that return a view on rvalues
This prevents, for example:

    StringView view = "foo"_string.bytes_as_string_view();

This prevents a class of potential UAF.
2024-04-04 11:23:21 +02:00
Dan Klishch
870a947040 AK: Remove StringInternals.h
Since we do not expose memory layout anymore in StringBase, there is no
need to keep StringData public.
2024-01-21 16:16:15 -07:00
Dan Klishch
fa52f68142 AK: Store data in FlyString as StringBase
Unfortunately, it is not clear to me how to split this commit into
several atomic ones.
2024-01-21 16:16:15 -07:00
Dan Klishch
e7700e16ee AK: Forward substring creation with shared superstring to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
5d6cd65e29 AK: Simplify String::repeated by leveraging StringBase helpers 2024-01-21 16:16:15 -07:00
Dan Klishch
7dbe357e9f AK: Simplify String::from_stream by leveraging StringBase helpers 2024-01-21 16:16:15 -07:00
Dan Klishch
dcd1fda9c8 AK: Introduce StringBase::replace_with_new_{short_,}string 2024-01-21 16:16:15 -07:00
Dan Klishch
d6290c4684 AK: Move String::hash() and String::String() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
1b09a1851e AK: Move String::~String() and String::destroy_string() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
54d149bc25 AK: Move String::bytes() and String::operator==(String) to StringBase
The idea is to eventually get rid of protected state in StringBase. To
do this, we first need to remove all references to m_data and
m_short_string from String.
2024-01-21 16:16:15 -07:00
Dan Klishch
4364a28d3d AK: Move data fields from AK::String to a newly created AK::StringBase
This starts separating memory management of string data and string
utilities like `String::formatted`. This would also allow to reuse the
same storage in `DeprecatedString` in the future.
2024-01-21 16:16:15 -07:00
Dan Klishch
6e2f627cb3 AK: Move StringData from String.cpp to a newly created StringInternals.h
This is done to allow using it in files other than AK/String.cpp.
2024-01-21 16:16:15 -07:00
Andreas Kling
3c039903fb LibTextCodec+AK: Don't validate UTF-8 strings twice
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.

This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
2023-12-30 13:49:50 +01:00
Andreas Kling
a285e36041 LibJS+AK: Make String.prototype.repeat() way faster
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.

This makes the following microbenchmark go 4x faster:

    "foo".repeat(100_000_000)

And for single character strings, we can even go 10x faster:

    "x".repeat(100_000_000)
2023-12-30 13:49:50 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Timothy Flynn
6aa334767f AK: Ensure assigned-to Strings are dereferenced if needed
If we assign to an existing non-short string, we must dereference its
StringData object to prevent leaking that data.
2023-11-28 16:38:18 +01:00
Andreas Kling
0902f552a3 AK: Bring some missing DeprecatedString API over to String
Specifically, case sensitivity parameters for starts/ends with,
and the equals_ignoring_ascii_case() helper.
2023-11-04 21:28:30 +01:00
Andreas Kling
1e820385d9 AK: Add case-insensitive hashing for the new String classes
Bringing over this functionality from DeprecatedString.
2023-09-06 11:29:03 -04:00
aryanbaburajan
a94c0eea94 AK: Add trim_ascii_whitespace method to String 2023-08-06 22:21:10 +02:00
Tim Schumacher
a3f73e7d85 AK: Rename Stream::read_entire_buffer to Stream::read_until_filled
No functional changes.
2023-03-13 15:16:20 +00:00
Andreas Kling
d517e7fb3a AK: Make FlyString::hash() use the cached hash in StringData if possible
This avoids rehashing the string every time.
2023-03-09 21:54:59 +01:00
Timothy Flynn
515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn
f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Timothy Flynn
da0d000909 AK: Ensure short String instances are valid UTF-8
We are currently only validating long strings.
2023-03-03 11:46:42 -05:00
Linus Groh
45dc3d8a3e AK: Add String::ends_with{,_bytes}() 2023-03-03 11:02:21 +00:00
Ali Mohammad Pur
79e4027480 AK: Add two starts_with{bytes,}() APIs to String 2023-02-28 15:52:24 +03:30
Tim Schumacher
9096b4d893 AK: Ensure that we fill the whole String when reading from a Stream 2023-02-21 22:28:15 -07:00
Andrew Kaster
0ea697ace5 AK: Add String::from_stream method
The caller is responsible for determining how long the string is that
they want to read.
2023-02-21 10:57:44 +01:00
Andreas Kling
e08c55dd8d AK: Make String const-correct internally 2023-02-21 00:54:04 +01:00
Andreas Kling
d0697d350d AK: Fix 64-bit alignment issue in shared-superstring substrings
Thanks to Timothy Flynn for the test!

Fixes #17141
2023-02-18 09:12:46 -05:00
Timothy Flynn
c59268d15b AK: Add String::trim 2023-01-28 00:13:46 +00:00
Timothy Flynn
cccaa94767 AK: Add String::join 2023-01-28 00:13:46 +00:00
Timothy Flynn
c35b1371a3 AK: Add an overload of String::find_byte_offset for StringView 2023-01-27 18:00:17 +00:00
Timothy Flynn
76fd5f2756 AK: Add convenience substring wrappers to String to exclude a length
These overloads exist on other string classes and are used throughout
the code base.
2023-01-24 16:23:50 -05:00
Timothy Flynn
427b82065c AK: Add a method to create a String with a repeated code point 2023-01-24 16:23:50 -05:00
Timothy Flynn
d50724956e AK: Add a method to find the byte offset of a code point 2023-01-24 16:23:50 -05:00
Timothy Flynn
ef275e25b8 AK: Reduce String's allocated data by one byte
This was copied from allocation_size_for_stringimpl, which had to ensure
the string is null-terminated. String makes no such guarantee.
2023-01-22 20:27:52 +00:00
Timothy Flynn
8aca8e82cb AK: Change String's default constructor to be constant
This allows creating expressions such as:

    constexpr Array<String, 10> {};
2023-01-22 01:03:13 +00:00