Commit graph

126 commits

Author SHA1 Message Date
Andreas Kling
a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Daniel Bertalan
4296425bd8 Everywhere: Remove redundant inequality comparison operators
C++20 can automatically synthesize `operator!=` from `operator==`, so
there is no point in writing such functions by hand if all they do is
call through to `operator==`.

This fixes a compile error with compilers that implement P2468 (Clang
16 currently). This paper restores the C++17 behavior that if both
`T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be
rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators
makes the rewriting possible again.
See https://reviews.llvm.org/D134529#3853062
2022-11-06 10:25:08 -07:00
demostanis
3e8b5ac920 AK+Everywhere: Turn bool keep_empty to an enum in split* functions 2022-10-24 23:29:18 +01:00
davidot
6fd8e96d53 AK: Add to_{double, float} convenience functions to all string types
These are guarded with #ifndef KERNEL, since doubles (and floats) are
not allowed in KERNEL mode.
In StringUtils there is convert_to_floating_point which does have a
template parameter incase you have a templated type.
2022-10-23 15:48:45 +02:00
Hendiadyoin1
154871834b AK: Add a helper to get the last split-group 2022-07-15 12:42:43 +02:00
sin-ack
3f8060d859 AK: Remove String <-> char const* comparison operators
During the removal of StringView(char const*), all users of these
functions were removed, and they are of dubious value (relying on
implicit StringView conversion).
2022-07-12 23:11:35 +02:00
DexesTTP
7ceeb74535 AK: Use an enum instead of a bool for String::replace(all_occurences)
This commit has no behavior changes.

In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
2022-07-06 11:12:45 +02:00
huttongrabiel
8ffa860bc3 AK: Add invert_case() and invert_case(StringView)
In the given String, invert_case() swaps lowercase characters with
uppercase ones and vice versa.
2022-05-26 21:51:23 +01:00
Jelle Raaijmakers
1577a8ba42 AK: Remove KERNEL check from String
Since we no longer use `String` inside of the kernel code, we can drop
this `#ifndef`.
2022-04-10 12:08:31 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Hendiadyoin1
820e03e8d4 AK: Add a case insensitive of is_one_of to String[View] 2022-03-21 10:48:17 +01:00
Andreas Kling
dd7eb3d6d8 AK: Add String::split_view(Function<bool(char)>)
This allows you to split around a custom separator, and enables
expressive code like this:

    string.split_view(is_ascii_space);
2022-02-25 19:38:31 +01:00
Linus Groh
b253bca807 AK: Add optional format string parameter to String{,Builder}::join()
Allow specifying a custom format string that's being used for each item
instead of hardcoding "{}".
2022-02-23 21:53:30 +00:00
Andreas Kling
4b900bc100 AK: Add fast path in String::trim() and String::trim_whitespace()
If the trimmed string would be the entire string, just return *this
instead of creating a new StringImpl.
2022-02-19 14:45:59 +01:00
Andreas Kling
2dd3b54827 AK: Make CaseInsensitiveStringTraits allocation-free
Instead of calling String::to_lowercase(), do case-insensitive hashing
and comparison.
2022-02-19 14:45:59 +01:00
Michel Hermier
e986ce961a AK: Avoid impl initialization before assignment in Stringconstructor 2022-01-23 13:29:12 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
5f7d008791 AK+Everywhere: Stop including Vector.h from StringView.h
Preparation for using Error.h from Vector.h. This required moving some
things out of line.
2021-11-10 21:58:58 +01:00
Idan Horowitz
6704961c82 AK: Replace the mutable String::replace API with an immutable version
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.

As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
2021-09-11 20:36:43 +03:00
Idan Horowitz
6d2b003b6e AK: Make String::count not use strstr and take a StringView
This was needlessly copying StringView arguments, and was also using
strstr internally, which meant it was doing a bunch of unnecessary
strlen calls on it. This also moves the implementation to StringUtils
to allow API consistency between String and StringView.
2021-09-11 20:36:43 +03:00
Brian Gianforcaro
fee2a03ba9 AK: Pass AK::Format TypeErasedFormatParams by reference in AK::String
This silences a overeager warning in sonar cloud, warning that
slicing could occur with `VariadicFormatParams` which derives from
`TypeErasedFormatParams`.

Reference:
https://sonarcloud.io/project/issues?id=SerenityOS_serenity&issues=AXuVPBW3k92xXUF3qXTE&open=AXuVPBW3k92xXUF3qXTE

This is a continuation of f0b3aa0331.
2021-09-01 18:06:14 +02:00
Timothy Flynn
262e412634 AK: Implement method to convert a String/StringView to title case
This implementation preserves consecutive spaces in the orginal string.
2021-08-26 22:04:09 +01:00
Brian Gianforcaro
f2d684fc24 AK: Annotate String.count as [[nodiscard]] 2021-08-13 11:08:11 +02:00
Jean-Baptiste Boric
7a9d05c24c AK: Add contains(char) method to String 2021-08-12 00:41:13 +02:00
Timothy Flynn
011514a384 AK: Fix declaration of {String,StringView}::is_one_of
The declarations need to consume the variadic parameters as "Ts&&..."
for the parameters to be forwarding references.
2021-08-02 21:02:09 +04:30
Tobias Christiansen
87033ce7d1 AK: Add generation of roman numerals to AK::String
We now can generate roman numbers using String::roman_number_from()
similar to String::bijective_base_from().
2021-07-04 22:17:03 +02:00
Max Wipfli
9cc35d1ba3 AK: Implement String::find_any_of() and StringView::find_any_of()
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
2021-07-02 21:54:21 +02:00
Max Wipfli
17eddf3ac4 AK: Add input bounds checking to String::substring()
This checks for overflow in String::substring(). It also rearranges some
declarations in the header.
2021-07-02 21:54:21 +02:00
Max Wipfli
268d81a56c AK: Add String::find_last() and inline String::find() methods
This adds the String::find_last() as wrapper for StringUtils::find_last,
which is another step in harmonizing the String and StringView APIs
where possible.

This also inlines the find() methods, as they are simple wrappers around
StringUtils functions without any additional logic.
2021-07-02 21:54:21 +02:00
Max Wipfli
d7a104c27c AK: Implement StringView::find_all()
This implements the StringView::find_all() method by re-implemeting the
current method existing for String in StringUtils, and using that
implementation for both String and StringView.

The rewrite uses memmem() instead of strstr(), so the String::find_all()
argument type has been changed from String to StringView, as the null
byte is no longer required.
2021-07-02 21:54:21 +02:00
sin-ack
3abcfcc178 AK: Add a way to disable the trimming of whitespace in to_*int
This behavior might not always be desirable, and so this patch adds a
way to disable it.
2021-06-18 19:18:15 +01:00
Ali Mohammad Pur
824a40e95b AK: Inline *String::is_one_of<Ts...>()
Previously this was generating a crazy number of symbols, and it was
also pretty-damn-slow as it was defined recursively, which made the
compiler incapable of inlining it (due to the many many layers of
recursion before it terminated).
This commit replaces the recursion with a pack expansion and marks it
always-inline.
2021-06-04 12:57:14 +02:00
Gunnar Beutner
a4f320c76b AK: Allow inlining more string functions 2021-06-03 08:06:51 +02:00
Max Wipfli
0e4f7aa8e8 AK: Add trim() method to String, StringView and StringUtils
The methods added make it possible to use the trim mechanism with
specified characters, unlike trim_whitespace(), which uses predefined
characters.
2021-06-01 09:28:05 +02:00
Max Wipfli
a557f83f8c AK: Verify that m_impl is non-null in String::operator[]
This helps to find bugs where null strings are indexed into with
operator[], as this would previously only report a RefPtr null
dereference.
2021-05-30 17:41:49 +01:00
Matthew Olsson
777c232e16 AK: Add String::repeated(StringView, size_t count) 2021-05-25 00:24:09 +04:30
Andreas Kling
de395a3df2 AK+Everywhere: Consolidate String::index_of() and String::find()
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.

Also add some tests to cover basic behavior.
2021-05-24 11:59:18 +02:00
Maciej Zygmanowski
80077cea86 AK: Add String::find_all() and String::count() 2021-05-19 20:51:51 +01:00
Gunnar Beutner
53d0150827 AK+Userland: Remove nullability feature for the ByteBuffer type
Nobody seems to use this particular feature, in fact there were some
bugs which were uncovered by removing operator bool.
2021-05-16 17:49:42 +02:00
Tobias Christiansen
4016e04061 AK: Move bijective-base-conversion into AK/String
This allows everybody to create a String version of their number
in a arbitrary bijective base. Bijective base meaning that the mapping
doesn't have a 0. In the usual mapping to the alphabet the follower
after 'Z' is 'AA'.
The mapping using the (uppercase) alphabet is used as a standard but
can be overridden specifying 'base' and 'map'.

The code was directly yanked from the Spreadsheet.
2021-05-01 01:19:40 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
edf0b14e23 AK: Remove String::format()
There are no more clients of this function, everyone has been converted
to String::formatted().
2021-04-21 23:49:03 +02:00
Andreas Kling
b0ccb5ba9d AK: Decorate most of String's API's with [[nodiscard]] 2021-04-21 23:49:01 +02:00
AnotherTest
a6e4482080 AK+Everywhere: Make StdLibExtras templates less wrapper-y
This commit makes the user-facing StdLibExtras templates and utilities
arguably more nice-looking by removing the need to reach into the
wrapper structs generated by them to get the value/type needed.
The C++ standard library had to invent `_v` and `_t` variants (likely
because of backwards compat), but we don't need to cater to any codebase
except our own, so might as well have good things for free. :^)
2021-04-10 21:01:31 +02:00
Linus Groh
e265054c12 Everywhere: Remove a bunch of redundant 'AK::' namespace prefixes
This is basically just for consistency, it's quite strange to see
multiple AK container types next to each other, some with and some
without the namespace prefix - we're 'using AK::Foo;' a lot and should
leverage that. :^)
2021-02-26 16:59:56 +01:00
AnotherTest
7c2754c3a6 AK+Kernel+Userland: Enable some more compiletime format string checks
This enables format string checks for three more functions:
- String::formatted()
- Builder::appendff()
- KBufferBuilder::appendff()
2021-02-23 13:59:33 +01:00
Linus Groh
4fafe14691 AK: Add String{,Utils}::to_snakecase()
This is an improved version of WrapperGenerator's snake_name(), which
seems like the kind of thing that could be useful elsewhere but would
end up getting duplicated - so let's add this to AK::String instead,
like to_{lowercase,uppercase}().
2021-02-21 19:47:47 +01:00