This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
On argument swapping to put positional ones toward the end,
m_arg_index was pointing at "last arg index" + "skipped args" +
"consumed args" and thus was pointing ahead of the skipped ones.
m_arg_index now points after the current parsed option arguments.
Using a vector to represent a list of painting commands results in many
reallocations, especially on pages with a lot of content.
This change addresses it by introducing a SegmentedVector, which allows
fast appending by representing a list as a sequence of fixed-size
vectors. Currently, this new data structure supports only the
operations used in RecordingPainter, which are appending and iterating.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
The slugify function is used to convert input into URL-friendly slugs.
It processes each character in the input, keeping ascii alpha characters
after lowercase and replacing non-alphanum characters with the glue
character or a space if multiple spaces are encountered consecutively.
The resulting string is trimmed of leading and trailing whitespace, and
any internal whitespace is replaced with the glue character.
It is currently used in LibMarkdown headings generation code.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
We now null out smart pointers *before* calling unref on the pointee.
This ensures that the same smart pointer can't be used to acquire a new
reference to the pointee after its destruction has begun.
I ran into this when destroying a non-empty IntrusiveList of RefPtrs,
but the problem was more general so this fixes it for all of RefPtr,
NonnullRefPtr, OwnPtr and NonnullOwnPtr.
This approximation tries to generate values within 0.1% of their actual
expected value. Microbenchmarks indicate that this iterative SIMD
version can be up to 60x faster than `AK::SIMD::exp`.
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.
FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
- does not inherit from `AK::Stream`
- uses `ErrorOr` for its API
- is heap allocated (and OOM-Safe)
This patch also add some tests.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float
To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.
The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.
Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.
For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.
For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
This is a set of functions that allow you to convert between arbitrary
IEEE 754 floating point types, as long as they can be represented
within 64 bits. Conversion methods between floats and doubles are
provided, as well as a generic `float_to_float()`.
Example usage:
#include <AK/FloatingPoint.h>
double val = 1.234;
auto weird_f16 =
convert_from_native_double<FloatingPointBits<0, 6, 10>>(val);
Signed and unsigned floats are supported, and both NaN and +/-Inf are
handled correctly. Values that do not fit in the target floating point
type are clamped.
This is an enum-like type that works with arbitrary sized storage > u64,
which is the limit for a regular enum class - which limits it to 64
members when needing bit field behavior.
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
This is the IPv6 counter part to the IPv4Address class and implements
parsing strings into a in6_addr and formatting one as a string. It
supports the address compression scheme as well as IPv4 mapped
addresses.
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.
commit ad141a2286
Author: Andreas Kling <kling@serenityos.org>
Date: Sat Jul 31 15:26:14 2021 +0200
Base: Remove "test.frm" from HackStudio test project
Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way