Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
This helper constructor exists on the unspecialized Span<T> class also,
and is convenient for e.g. creating Bytes from:
u8 buffer[64];
Bytes bytes { buffer };
From the getentropy() man page, "The maximum permitted value for the
length argument is 256". Several of our tests were passing lengths of
several thousand bytes, causing getentropy() to fail with EIO, which we
were completely ignoring. This caused these tests to only test long
sequences of 0x00.
We now loop over the provided buffer to fill it 256 bytes at a time. If
getentropy() fails for any reason, we fall back to the default method of
filling it with one random byte at a time.
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.
We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:
13.62s to 10.9s on Serenity (cold)
13.62s to 9.22s on Serenity (warm)
2.93s to 2.32s on Linux
One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
We currently only fill a buffer when it is empty. So if it has 1 byte
and 16 KB was requested, only that 1 byte would be returned. Instead,
attempt to refill the buffer when it's size is less than the requested
size.
When reading, we currently only fill a BufferedStream's buffer when it
is empty, and only with 1 KB of data. This means that while the buffer
defaults to a size of 16 KB, at least 15 KB is always unused.
The current implementation of `Array<T, 0>` has a zero-length C array as
its storage type. While this is accepted as a GNU extension, when
compiling with Clang 16, an UBSan error is raised every time an object
is accessed whose only field is a zero-length array.
This is likely a bug in Clang 16's implementation of UBSan, which has
been reported here: https://github.com/llvm/llvm-project/issues/61775
Else:
AK/BitStream.h:218:24:
error: inline function '...::lsb_mask<unsigned char>' is not
defined [-Werror,-Wundefined-inline]
static constexpr T lsb_mask(T bits)
^
We current buffer one byte of data from the underlying stream. And when
we pull bits off that buffer, we do so 1 or 8 bits at a time (depending
on whether the buffer is byte aligned). The 1-bit-at-a-time loop is by
far the most common during e.g. GZIP decompression.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to extract the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:
242s to 35s on Serenity
11.125s to 3.527s on Linux
Note that BigEndianInputBitStream can also use the same techniques,
and some of the methods here may make sense to live in an endianness-
agnostic base class. The focus is GZIP right now though, which only
uses the little endian stream.
This mirrors String::from_utf8(StringView).
Jakt will use this to construct strings instead of just assuming the
allocation will succeed, lowering the API difference between
Jakt::String and AK::String by one API :^)
This patch parses enough of GPOS tables to be able to support the
kerning information embedded in Inter.
Since that specific font only applies positioning offsets to the first
glyph in each pair, I was able to get away with not changing our API.
Once we start adding support for more sophisticated positioning, we'll
need to be able to communicate more than a simple "kerning offset" to
the clients of this code.
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.
Co-authored-by: Luke Wilde <lukew@serenityos.org>
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
We don't need to decode the entire code point to know its length. This
reduces the runtime of decoding a string containing 5 million instances
of U+10FFFF from over 4 seconds to 0.9 seconds.
Let's add FlyString::from_deprecated_fly_string() so we can use it
instead of FlyString::from_utf8(). This will make it easier to detect
potential unncessary allocations as we transfer to FlyString.
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
Indented #cmakedefine01 is supported since CMake 3.10:
https://cmake.org/cmake/help/latest/release/3.10.html#commands
We're on 3.16, and the minimum required for Serenity itself is 3.25, so
this should be fine. And it makes CLion's auto-formatter much happier!
For example, the code point U+002F could be encoded as UTF-8 with the
bytes 0x80 0xAF. This trick has historically been used to bypass
security checks.
This is needed to have code for creating an in-memory sRGB profile using
the (floating-ppoint) numbers from the sRGB spec and having the
fixed-point values in the profile match what they are in other software
(such as GIMP).
It has the side effect of making the FixedPoint ctor no longer constexpr
(which seems fine; nothing was currently relying on that).
Some of FixedPoint's member functions don't round yet, which requires
tweaking a test.
`consume_until(foo)` stops before foo, and so does
`ignore_until(Predicate)`, so let's make the other `ignore_until()`
overloads consistent with that so they're less confusing.
This commit moves the implementation of getopt into AK, and converts its
API to understand and use StringView instead of char*.
Everything else is caught in the crossfire of making
Option::accept_value() take a StringView instead of a char const*.
With this, we must now pass a Span<StringView> to ArgsParser::parse(),
applications using LibMain are unaffected, but anything not using that
or taking its own argc/argv has to construct a Vector<StringView> for
this method.
The output of the DeprecatedString::bijective_base_from() is now
correct for numbers larger than base^2.
This makes column names display correctly in Spreadsheet.
We briefly discussed this when adding the new String type but couldn't
settle on a name. However, having to use String::from_utf8() on every
literal string is a bit unwieldy, so let's have these options available!
Naming-wise '_string' is not as short as 'sv' but should be relatively
clear; it also matches '_bigint' and '_ubigint' in length.
'_short_string' may be longer than the actual string itself, but it's
still an improvement over the static function :^)
Since our C++ source files are UTF-8 encoded anyway, it should be
impossible to create a string literal with invalid UTF-8, so including
that in the name is not as important as in the function that can receive
arbitrary data.
At the moment, this processes the RIFF chunk structure and extracts
the ICCP chunk, so that `icc` can now print ICC profiles embedded
in webp files. (And are image files really more than containers
of icc profiles?)
It doesn't even decode image dimensions yet.
The lossy format is a VP8 video frame. Once we get to that, we
might want to move all the image decoders into a new LibImageDecoders
that depends on both LibGfx and LibVideo. (Other newer image formats
like heic and av1f also use video frames for image data.)
This naming scheme matches Vector.
This also changes `take_last` to move the value it takes, and delete by
known pointer, avoiding a full lookup and potential copies.
Until now, it was possible to assign a RP<T const> or NNRP<T const>
to RP<T> or NNRP<T>. This meant that the constness of the T was lost.
We had a lot of code that relied on this sloppiness, and by the time
you see this commit, I hopefully found and fixed all of it. :^)
This stops us needing a lot of ugly `FlyString { ... }` wrappers. THis
is the behavior that `DeprecatedFlyString(DeprecatedString)` has so it
should be fine.
The patch also contains modifications on several classes, functions or
files that are related to the `JPGLoader`.
Renaming include:
- JPGLoader{.h, .cpp}
- JPGImageDecoderPlugin
- JPGLoadingContext
- JPG_DEBUG
- decode_jpg
- FuzzJPGLoader.cpp
- Few string literals or texts
Instead of rehashing on collisions, we use Robin Hood hashing: a simple
linear probe where we keep track of the distance between the bucket and
its ideal position. On insertion, we allow a new bucket to "steal" the
position of "rich" buckets (those near their ideal position) and move
them further down.
On removal, we shift buckets back up into the freed slot, decrementing
their distance while doing so.
This behavior automatically optimizes the number of required probes for
any value, and removes the need for periodic rehashing (except when
expanding the capacity).
This approximation tries to generate values within 0.1% of their actual
expected value. Microbenchmarks indicate that this iterative SIMD
version can be up to 60x faster than `AK::SIMD::exp`.
The parser is still very much a work-in-progress, but it can currently
parse most of the basic bits, the only *completely* unimplemented things
in the parser are:
- heredocs (io_here)
- alias expansion
- arithmetic expansion
There are a whole suite of bugs, and syntax highlighting is unreliable
at best.
For now, this is not attached anywhere, a future commit will enable it
for /bin/sh or a `Shell --posix` invocation.
This ensures constructors that take a span or an initializer_list
don't allocate when there's already enough inline storage.
(Previously these constructors always allocated)
This is done by providing Traits<ByteBuffer>::equals functions for
(Readonly)Bytes, as the base GenericTraits<T>::equals is unable to
convert the ByteBuffer to (Readonly)Bytes to then use Span::operator==
This allows us to check if a Vector<ByteBuffer> contains a
(Readonly)Bytes without having to making a copy of it into a ByteBuffer
first. The initial use of this is in LibWeb with CORS-preflight, where
we check the split contents of the Access-Control headers with
Fetch::Infrastructure::Request::method() and static StringViews
such as "*"sv.bytes().
It wouldn't make much sense on its own (as the Kernel only has errno
Errors), but it's an easy fix for not having to ifdef away every single
usage of `is_errno` in code that is shared between Userland and Kernel.
This code should not be used in the kernel - we should always propagate
proper errno codes in case we need to return those to userland so it
could decode it in a reasonable way.
This new method is meant to be used in both userspace and kernel code.
The idea is to allow printing of a verbose message and then returning an
errno code which is the proper mechanism for kernel code because we
should almost always assume that such error will be propagated back to
userspace in some way, so the userspace code could reasonably decode it.
For userspace code however, this new method is meant to be a simple
wrapper for Error::from_string_view, because for most invocations, it's
much more useful to have a verbose & literal error than a errno code, so
we simply ignore that errno code completely in such context.
For example, consider cases where we want to propagate errors only in
specific instances:
auto result = read_data(); // something like ErrorOr<ByteBuffer>
if (result.is_error() && result.error().code() != EINTR)
continue;
auto bytes = TRY(result);
The TRY invocation will currently copy the byte buffer when the
expression (in this case, just a local variable) is stored into
_temporary_result.
This patch binds the expression to a reference to prevent such copies.
In less trival invocations (such as TRY(some_function()), this will
incur only temporary lifetime extensions, i.e. no functional change.
As of now, there is a default copy constructor on Error. A future commit
will make this non-public to prevent implicit copies, so to prepare for
that, this adds a factory for the few cases where a copy is really
needed.
Just because we may compile serenity with or without NDEBUG doesn't
mean that consuming projects or Ports will share the setting.
Always define the custom assertion function so that we don't have to
keep the same debug settings between all projects.
This API is only used by Jakt to implement weak reference unwrapping.
By making it return a NonnullRefPtr, it can be assigned to anything
that accepts a NonnullRefPtr, unlike the previous T* return type (since
that can also be null).
Template argument are checked to ensure that the `Out` type is equal or
convertible to the type returned by the invokee.
Compilation now fails on:
`Function<void()> f = []() -> int { return 0; };`
But this is allowed:
`Function<ErrorOr<int>()> f = []() -> int { return 0; };`
Pretty much no other read function does this, and getting rid of the
typename template parameter for the stream makes the transition to the
new AK::Stream a bit easier.
Similar to the return values earlier, a signed value doesn't really make
sense here. Relying on the much more standard `size_t` makes it easier
to use Stream in all contexts.
Quick select is an algorithm that is able to find the median
of a Vector without fully sorting it.
This replaces the old very naive implementation
for `AK::Statistics::median()` with `AK::quickselect_inline`
This adds the quick select algorithm that finds
the kth smallest element for any collection.
Whilst doing so it also partially sorts the collection.
I have also included the option to use different pivoting functions
including median of medians which makes the quick select have
a truely linear time complexity at the costs of enormous overhead,
so this that only really useful for really large datasets.
The same was chosen to reflect the fact that it modifies
the collection in place during the selection process.
The AnyString concept is currently broken because it checks whether a
StringView is constructible from a type T. The StringView constructors,
however, only accept constant rvalue references - i.e. `T const&`.
This also adds a test to ensure this continues to work.
`Stream` will be qualified as `AK::Stream` until we remove the
`Core::Stream` namespace. `IODevice` now reuses the `SeekMode` that is
defined by `SeekableStream`, since defining its own would require us to
qualify it with `AK::SeekMode` everywhere.
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
As a nearby comment says, "This is a terrible approximation".
This doesn't make things less terrible, but it does make things
more correct in the given framework of terribleness.
Fixes#17156.
A lot of places were relying on AK/Traits.h to give it strnlen, memcmp,
memcpy and other related declarations.
In the quest to remove inclusion of LibC headers from Kernel files, deal
with all the fallout of this included-everywhere header including less
things.
In cases where we know a string literal will fit in the short string
storage, we can do so at compile time without needing to handle error
propagation. If the provided string literal is too long, a compilation
error will be emitted due to the failed VERIFY statement being a non-
constant expression.
This does not need to be defined in Format.h. This causes FixedPoint.h
to be included everywhere. This is particularly going to be an issue
when trying to include <CoreServices/CoreServices.h> on macOS. The macOS
SDK defines its own FixedPoint structure which will conflict with ours.
The Unicode spec defines much more complicated caseless matching
algorithms in its Collation spec. This implements the "basic" case
folding comparison.
`get()` is intended as a replacement for `get_deprecated()` and `get_ptr
()`. The former returns the same value for "key not found" and "key
found and contains `null`" which is ambiguous. The latter returns a raw
pointer which is spooky. Returning `Optional<JsonValue const&>` covers
all the previous uses for these.
The `get_foo()` methods are helpers to make user code less verbose. Most
of the time, we only want a specific type of value: if we want a number
and get a string, we respond the same as if the value was not there at
all. These make that easier to express.
This also adjusts the `has_i32()` method and friends to examine the
value instead of just looking at the underlying type.
The existing `is_i32()` and friends only check if `i32` is their
internal type, but a value such as `0` could be literally any integer
type internally. `is_integer<T>()` instead determines whether the
contained value is an integer and can fit inside T.
While at it, rename the `read_trivial_value` and `write_trivial_value`
functions to `read_value` and `write_value` respectively, since we'll
add compatibility for non-trivial types down the line.
This error was introduced by 9a7accdd and had a significant impact on
`BufferedFile` behavior. Hence, we started seeing crash in test262.
By itself, the issue was a wrong calculation of the internal reading
spans when using the `read` and `until` parameters. Which can lead to
at worse crash in VERIFY and at least weird behaviors as missed needles
or detections out of bounds.
It was also accompanied by an erroneous test.
This patch fixes the bug, the test and also provides more tests.
If USING_AK_GLOBALLY is not defined, the name IsLvalueReference might
not be available in the global namespace. Follow the pattern established
in LibTest to fully qualify AK types in macros to avoid this problem.
This parameter allows to start searching after an offset. For example,
to resume a search.
It is unfortunately a breaking change in API so this patch also modifies
one user and one test.
The previously defined operator was swap-based. With the defaulted
implementation, both integers are now copied, but it doesn't matter as
only the `ByteBuffer` allocates memory (i.e. non-null integers values
won't affect the destruction).
These are formatters that can only be used with debug print
functions, such as dbgln(). Currently this is limited to
Formatter<ErrorOr<T>>. With this you can still debug log ErrorOr
values (good for debugging), but trying to use them in any
String::formatted() call will fail (which prevents .to_string()
errors with the new failable strings being ignored).
You make a formatter debug only by adding a constexpr method like:
static constexpr bool is_debug_only() { return true; }
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.
FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This allows us to make all comparision operators on the class constexpr
without pulling in a bunch of boilerplate. We don't use the `<compare>`
header because it doesn't compile in the main serenity cross-build due
to the include paths to LibC being incompatible with how libc++ expects
them to be for clang builds.
The previous implementation of Statistics::median() was slightly
incorrect with an even number of elements since in those cases it needs
to be the arithmetic mean of the two elements that share the middle
position.
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:
\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b
This regex is pessimistic, so there might be more files that don't
actually use any memory function.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
stdlib.h, but don't match the regex:
\\b(_abort|abort|abs|aligned_alloc|arc4random|arc4random_buf|arc4random_
uniform|atexit|atof|atoi|atol|atoll|bsearch|calloc|clearenv|div|div_t|ex
it|_Exit|EXIT_FAILURE|EXIT_SUCCESS|free|getenv|getprogname|grantpt|labs|
ldiv|ldiv_t|llabs|lldiv|lldiv_t|malloc|malloc_good_size|malloc_size|mble
n|mbstowcs|mbtowc|mkdtemp|mkstemp|mkstemps|mktemp|posix_memalign|posix_o
penpt|ptsname|ptsname_r|putenv|qsort|qsort_r|rand|RAND_MAX|random|reallo
c|realpath|secure_getenv|serenity_dump_malloc_stats|serenity_setenv|sete
nv|setprogname|srand|srandom|strtod|strtof|strtol|strtold|strtoll|strtou
l|strtoull|system|unlockpt|unsetenv|wcstombs|wctomb)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use anything from the stdlib.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Concepts.h, but don't match the regex:
\\b(AnyString|Arithmetic|ArrayLike|DerivedFrom|Enum|FallibleFunction|Flo
atingPoint|Fundamental|HashCompatible|Indexable|Integral|IterableContain
er|IteratorFunction|IteratorPairWith|OneOf|OneOfIgnoringCV|SameAs|Signed
|SpecializationOf|Unsigned|VoidFunction)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any concepts.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:
\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.
Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
obtain any lock rank just via the template parameter. It was not
previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
wanted in the first place (or made use of). Locks randomly changing
their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
taken in the right order (with the right lock rank checking
implementation) as rank information is fully statically known.
This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
A possible integer overflow might have occured inside the function in
case (number % unit) * 10 did not fit into a u64. So it is verified that
this does not happen at the beginning of the function.
These instances were detected by searching for files that include
IterationDecision.h, but don't match the regex:
\\bIterationDecision(?!\.h>)\\b
This is the only symbol defined by IterationDecision.h.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
Array.h, but don't match the regex:
\\b(Array(?!\.h>)|iota_array|integer_sequence_generate_array)\\b
These are the three symbols defined by Array.h.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
- does not inherit from `AK::Stream`
- uses `ErrorOr` for its API
- is heap allocated (and OOM-Safe)
This patch also add some tests.
This file does not contain any architecture specific implementations,
so we can move it to the Kernel base directory. Also update the relevant
include paths.
Mark other ErrorOr types as friends, and fix a typo in the &&
constructor, so that we can create an ErrorOr<Core::Object> from an
ErrorOr<GUI::Widget>. Also, add some requires() clauses to these
constructors so the error messages are clearer.
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
This allows callers to use the following semantics:
using MyVariant = Variant<Empty, int>;
template<typename T>
size_t size() { return TypeList<T>::size; }
auto s = size<MyVariant>();
This will be needed for an upcoming IPC change, which will result in us
knowing the Variant type, but not the underlying variadic types that the
Variant holds.
This shrinks sizeof(Error) from 32 bytes to 24 bytes, which in turn will
shrink sizeof(ErrorOr<T>) by the same amount (in cases where sizeof(T)
is less than sizeof(Error)).
Instead of avoiding overflow-checking builtins with AK_COMPILER_CLANG,
we can use the preprocessor's __has_builtin() mechanism to check if
they are available.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
This class is a smart pointer that let you provide a custom deleter to
free the pointer.
It is quite primitive compared to other smart pointers but can still be
useful when interacting with C types that provides a custom `free()`
function.
This was removed in a910961f37d1da9dafb6385e348266746354cf98 in favour
of the more general USING_AK_GLOBALLY #define, but Ladybird (and
probably other projects) depend on the smaller hammer to include STL
headers and keep the USING_AK_GLOBALLY behaviour, so put it back and
preserve its behaviour.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
A couple headers expected names to be in the global namespace, qualify
those names to make sure they're resolved even when the names are not
exported.
One header placed its functions in the global namespace, move those to
the AK namespace to make the concepts resolve.
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
Note that Jakt only allows StringView creation from string literals, so
none of the invariants in the class are broken by this (if used only
from within Jakt).
This allows the user to transform the contents of the optional (if any
exists), without manually unwrapping and then rewrapping it.
This is needed by the Jakt runtime.
This is used in Jakt, and providing that value from Jakt's side is more
trouble than doing this.
Considering this class is bound to go away, a little
backwards-compatible API change is just fine.
The previous moved-from state was the null string. This violates both
our invariant that String is never null, and also the C++ contract that
the moved-from state must be valid but unspecified. The empty short
string state is of course valid, so it satisfies both invariants. It
also allows us to remove any extra checks for the null state.
The reason this change is made is primarily because swap() requires
moved-from objects to be reassignable (C++ allows this). Because the
move assignment of String would not check the null state, it crashed
trying to increment the data reference count (nullptr signals a
non-short string). This meant that e.g. quick_sort'ing String would
crash immediately.
s p a c e s h i p o p e r a t o r
Comparing UTF-8 can be done by simple byte lexicographic comparison per
definition, so we just piggy-back on StringView's high-performance
comparator.
Similar to how LibJS and LibSQL used to behave, the boolean constructor
of JsonValue is currently allowing pointers to be used to construct a
boolean value. Explicitly disallow such construction.
This allows us to pass the new String type to functions that take a
StringView directly, having to call bytes_as_string_view() every time
gets old quickly.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
Previously we allowed the end_offset to be larger than the chunk itself,
which made it so that certain input sizes would make the logic attempt
to delete a nonexistent object.
Fixes#16308.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This patch adds support for 128-bit floating points in FloatExtractor.
This is required to build SerenityOS on MacOS/aarch64. It might break
building for Raspberry Pi.
AK internals like to use concepts and details without a fully qualified
name, which usually works just fine because we make everything
AK-related available to the unqualified namespace.
However, this breaks as soon as we start not using `USING_AK_GLOBALLY`,
due to those identifiers no longer being made available. Instead, we
just export those into the `AK` namespace instead.