DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.
This is a step towards integrating Jakt and AK types.
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
Because AK/Concepts.h includes AK/Forward.h and concepts cannot be
forward declared, slightly losen the FixedPoint template arguments
so that we can forward declare it in AK/Forward.h
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
This changes JsonObject to use the new OrderedHashMap instead of an
extra vector for tracking the insertion order.
This also adds a default value for the KeyTraits template argument in
OrderedHashMap. Furthermore, it fixes two cases where code iterating
over a JsonObject relied on the value argument being copied before
invoking the callback.
All usages of AK::InlineLinkedList have been converted to
AK::IntrusiveList. So it's time to retire our old friend.
Note: The empty white space change in AK/CMakeLists.txt is to
force CMake to re-glob the header files in the AK directory so
incremental build will work when folks git pull this change locally.
Otherwise they'll get errors, because CMake will attempt to install
a file which no longer exists.
This commit makes it possible to instantiate `Vector<T&>` and use it
to store references to `T` in a vector.
All non-pointer observers are made to return the reference, and the
pointer observers simply yield the underlying pointer.
Note that the 'find_*' methods act on the values and not the pointers
that are stored in the vector.
This commit also makes errors in various vector methods much more
readable by directly using requires-clauses on them.
And finally, it should be noted that Vector cannot hold temporaries :^)
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.
This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.
With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
These structs can be inconsistent, for example if the amount of microseconds is
negative or larger than 1'000'000. Therefore, they should not be copied as-is.
Use copy_time_from_user instead.
This was weird. It turns out these class were using int indexes and
sizes despite being derived from Vector which uses size_t.
Make the universe right again by using size_t here as well.
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().
Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
This is useful for collecting statistics, e.g.
Atomic<unsigned, MemoryOrder::memory_order_relaxed> would allow
using operators such as ++ to use relaxed semantics throughout
without having to explicitly call fetch_add with the memory order.
Problem:
- Building with clang is broken because of the `struct` vs `class`
mismatch between the definition and declaration.
Solution:
- Change `class` to `struct` in the forward declaration.
There are three classes avaliable that share the functionality of
BufferStream:
1. InputMemoryStream is for reading from static buffers. Example:
Bytes input = /* ... */;
InputMemoryStream stream { input };
LittleEndian<u32> little_endian_value;
input >> little_endian_value;
u32 host_endian_value;
input >> host_endian_value;
SomeComplexStruct complex_struct;
input >> Bytes { &complex_struct, sizeof(complex_struct) };
2. OutputMemoryStream is for writing to static buffers. Example:
Array<u8, 4096> buffer;
OutputMemoryStream stream;
stream << LittleEndian<u32> { 42 };
stream << ReadonlyBytes { &complex_struct, sizeof(complex_struct) };
foo(stream.bytes());
3. DuplexMemoryStream for writing to dynamic buffers, can also be used
as an intermediate buffer by reading from it directly. Example:
DuplexMemoryStream stream;
stream << NetworkOrdered<u32> { 13 };
stream << NetowkrOrdered<u64> { 22 };
NetworkOrdered<u32> value;
stream >> value;
ASSERT(value == 13);
foo(stream.copy_into_contiguous_buffer());
Unlike BufferStream these streams do not use a fixed endianness
(BufferStream used little endian) these have to be explicitly specified.
There are helper types in <AK/Endian.h>.
OutputMemoryStream was originally a proxy for DuplexMemoryStream that
did not expose any reading API.
Now I need to add another class that is like OutputMemoryStream but only
for static buffers. My first idea was to make OutputMemoryStream do that
too, but I think it's much better to have a distinct class for that.
I originally wanted to call that class FixedOutputMemoryStream but that
name is really cumbersome and it's a bit unintuitive because
InputMemoryStream is already reading from a fixed buffer.
So let's just use DuplexMemoryStream instead of OutputMemoryStream for
any dynamic stuff and create a new OutputMemoryStream for static
buffers.
This class is similar to BufferStream because it is possible to both
read and write to it. However, it differs in the following ways:
- DuplexMemoryStream keeps a history of 64KiB and discards the rest,
BufferStream always keeps everything around.
- DuplexMemoryStream tracks reading and writing seperately, the
following is valid:
DuplexMemoryStream stream;
stream << 42;
int value;
stream >> value;
For BufferStream it would read:
BufferStream stream;
stream << 42;
int value;
stream.seek(0);
stream >> value;
In the future I would like to replace all usages of BufferStream with
InputMemoryStream, OutputMemoryStream (doesn't exist yet) and
DuplexMemoryStream. For now I just add DuplexMemoryStream though.
FlyString is a flyweight string class that wraps a RefPtr<StringImpl>
known to be unique among the set of FlyStrings. The class is very
unoptimized at the moment.
When to use FlyString:
- When you want O(1) string comparison
- When you want to deduplicate a lot of identical strings
When not to use FlyString:
- For strings that don't need either of the above features
- For strings that are likely to be unique