This was added in commit f2663f477f as a
partial implementation of what is now LibWeb's forgiving Base64 decoder.
All use cases within LibWeb that require whitespace skipping now use
that implementation instead.
Removing this feature from AK allows us to know the exact output size of
a decoded Base64 string. We can still trim whitespace at the start and
end of the input though; for example, this is useful when reading from a
file that may have a newline at the end of the file.
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.
The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
Otherwise, we percent-encode negative signed chars incorrectly. For
example, https://www.strava.com/login contains the following hidden
<input> field:
<input name="utf8" type="hidden" value="✓" />
On submitting the form, we would percent-encode that field as:
utf8=%-1E%-64%-6D
Which would cause us to receive an HTTP 500 response. We now properly
percent-encode that field as:
utf8=%E2%9C%93
And can login to Strava :^)
We already have a helper to split a StringView by line while considering
"\n", "\r", and "\r\n". Add an analagous method to just count the number
of lines in the same manner.
If the BufferedStream is able to fill its entire circular buffer in
populate_read_buffer() and is later asked to read a line or read until
a delimiter, it could erroneously return EMSGSIZE if the caller's buffer
was smaller than the internal buffer. In this case, all we really care
about is whether the caller's buffer is big enough for however much data
we're going to copy into it. Which needs to take into account the
candidate.
This makes it possible to use MakeIndexSequqnce in functions like:
template<typename T, size_t N>
constexpr auto foo(T (&a)[N])
This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.
(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
On argument swapping to put positional ones toward the end,
m_arg_index was pointing at "last arg index" + "skipped args" +
"consumed args" and thus was pointing ahead of the skipped ones.
m_arg_index now points after the current parsed option arguments.
`JsonValue::to_byte_string` has peculiar type-erasure semantics which is
not usually intended. Unfortunately, it also has a very stereotypical
name which does not warn about unexpected behavior. So let's prefix it
with `deprecated_` to make new code use `as_string` if it just wants to
get string value or `serialized<StringBuilder>` if it needs to do proper
serialization.
A bunch of users used consume_specific with a constant ByteString
literal, which can be replaced by an allocation-free StringView literal.
The generic consume_while overload gains a requires clause so that
consume_specific("abc") causes a more understandable and actionable
error.
Using a vector to represent a list of painting commands results in many
reallocations, especially on pages with a lot of content.
This change addresses it by introducing a SegmentedVector, which allows
fast appending by representing a list as a sequence of fixed-size
vectors. Currently, this new data structure supports only the
operations used in RecordingPainter, which are appending and iterating.
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
Previously, `replace` used `find_all` to find all of the positions to
replace. But `find_all` finds all the *overlapping* instances of the
needle, while `replace` assumed that the next position was always at
least `needle.length()` away from the last one. This led to crashes like
https://github.com/SerenityOS/jakt/issues/1159.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
There were two issues:
1) the C+=R and C-=R operators expected arithmetic types to have .real()
2) the R+C, R-C, R*C and R/C operators applied the operation in wrong
order (did C+R, C-R, C*R and C/R instead). This wouldn't matter for
+ and * which are commutative, but is incorrect for - and /.
Let's replace this bool with an `enum class` in order to enhance
readability. This is done by repurposing `MappedFile`'s `OpenMode` into
a shared `enum` simply called `Mode`.
Previously, `URLParser` was constructing a new String for every
character of the URL's username and password. This change improves
performance by eliminating those unnecessary String allocations.
A URL with a 100,000 character password can now be parsed in ~30ms vs
~8 seconds previously on my machine.
Previously we assumed a default precision of 6, which made the printed
values quite odd in some cases.
This commit changes that default to print them with just enough
precision to produce the exact same float when roundtripped.
This commit adds some new tests that assert exact format outputs, which
have to be modified if we decide to change the default behaviour.
The slugify function is used to convert input into URL-friendly slugs.
It processes each character in the input, keeping ascii alpha characters
after lowercase and replacing non-alphanum characters with the glue
character or a space if multiple spaces are encountered consecutively.
The resulting string is trimmed of leading and trailing whitespace, and
any internal whitespace is replaced with the glue character.
It is currently used in LibMarkdown headings generation code.
Fixed the issue in StringUtils::convert_to_floating_point() where the
end pointer of the trimmed string was not being passed, causing the
function to consistently return 'None' when given strings with trailing
whitespaces.
There were 2 issues with the way we formatted floating point decimals:
if the part after the decimal point exceeded the max of an u64 we would
generate wildly incorrect decimals, and we applied no rounding.
With this new code, we emit decimals one by one and perform a simple
reverse string walk to round the number up if required.
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>
Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
The mentioned functions used m_size / 8 instead of size_in_bytes()
(division with ceiling rounding mode), which resulted in an off-by-one
error such that the functions didn't search in the last not-fully-8-bits
byte.
Using size_in_bytes() instead of m_size / 8 fixes this.
When working with FixedMemoryStreams, and especially MappedFiles, you
may don't want to copy the underlying data when you read from the
stream. Pointing into that data is perfectly fine as long as you know
the lifetime of it is long enough.
This commit adds a couple of methods for reading either a single value,
or a span of them, in this way. As noted, for single values you sadly
get a raw pointer instead of a reference, but that's the only option
right now.
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
There are a bunch of tests that check for time_t handling 64-bit values
properly. This makes sense, but also obviously doesn't work when time_t
is 32-bit, and causes compile-time errors. Compile these tests out in
that case.
Since there's no straightforward way to check sizeof(time_t) inside the
proprocessor, only do this when glibc's __TIMESIZE is defined.
The main change is the simplification of the expression
`(10^precision * fraction) / 2^precision` to `5^precision * fraction`.
Those expressions overflow or not depends on the value of `precision`
and `fraction`. For the maximum value of `fraction`, the following table
shows for which value of `precision` overflow will occur.
Old New
u32 08 10
u64 15 20
u128 30 39
As of now `u64` type is used to calculate the result of the expression.
Meaning that before, only FixedPoints with `precision` less than 15
could be accurately rendered (for every value of fraction) in decimal.
Now, this limit gets increased to 20.
This refactor also fixes, broken decimal render for explicitly specified
precision width in format string, and broken hexadecimal render.
Because of the off-by-one error, the second bit of the fraction was
getting ignored in differentiating between fractions equal to 0.5 or
greater than 0.5. This resulted in numbers like 2.75 being considered
as having fraction equal to 0.5 and getting rounded incorrectly (to 2).
There was a small mishmash of argument order, as seen on the table:
| Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
============= | ======================= | =======================
uses equals() | HashMap | Vector, HashTable
defines equals() | *String[^1] | ByteBuffer
[^1]: String, DeprecatedString, their Fly-type equivalents and KString.
This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.
I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.
With this change, each of the following lines will now compile
successfully:
Vector<String>().contains_slow("WHF!"sv);
HashTable<String>().contains("WHF!"sv);
HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').
This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.
data: URLs didn't have this problem, because we have a special case for
parsing them.
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.
Making that change resulted in a whole bunch of fallout.
After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
I misunderstood the spec step for checking whether the host 'ends with a
number'. We can't simply check for it if ends with a number, this check
is actually an algorithm which is required to avoid detecting hosts that
end with a number from an IPv4 host.
Implement this missing step, and add a test to cover this.
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.
Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.
Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
I was debugging a different issue in Ladybird, and noticed that
completing relative file URLs with URL::complete_url didn't seem to work
right. This test case covers both the working https case, as well as the
file URL case fixed by the previous commit.
Change the name and return type of
`IPv6Address::to_deprecated_string()` to `IPv6Address::to_string()`
with return type `ErrorOr<String>`.
It will now propagate errors that occur when writing to the
StringBuilder.
There are two users of `to_deprecated_string()` that now use
`to_string()`:
1. `Formatted<IPv6Address>`: it now propagates errors.
2. `inet_ntop`: it now sets errno to ENOMEM and returns.