These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
Sticking this to the function source has multiple benefits:
- We instrument more code, by not excluding entire files.
- NO_SANITIZE_COVERAGE can be used in Header files.
- Keeping the info with the source code, means if a function or
file is moved around, the NO_SANITIZE_COVERAGE moves with it.
GCC sometimes complains about the The `no_sanitize("address")` syntax,
and clang sometimes complains abouth the `no_sanitize_address` syntax.
Both claim to support both, so that's neat!
On macOS, it's not trivial to get a Mach task port for your children.
This implementation registers the chrome process as a well-known
service with launchd based on its pid, and lets each child process
send over a reference to its mach_task_self() back to the chrome.
We'll need this Mach task port right to get process statistics.
For example, consider the following code snippet:
Vector<Function<void()>> m_callbacks;
void add_callback(Function<void()> callback)
{
m_callbacks.append(move(callback));
}
// Somewhere else...
void do_something()
{
int a = 10;
add_callback([&a] {
dbgln("a is {}", a);
});
} // Oops, "a" is now destroyed, but the callback in m_callbacks
// has a reference to it!
We now statically detect the capture of "a" in the lambda above and flag
it as incorrect. Note that capturing the value implicitly with a capture
list of `[&]` would also be detected.
Of course, many functions that accept Function<...> don't store them
anywhere, instead immediately invoking them inside of the function. To
avoid a warning in this case, the parameter can be annotated with
NOESCAPE to indicate that capturing stack variables is fine:
void do_something_now(NOESCAPE Function<...> callback)
{
callback(...)
}
Lastly, there are situations where the callback does generally escape,
but where the caller knows that it won't escape long enough to cause any
issues. For example, consider this fake example from LibWeb:
void do_something()
{
bool is_done = false;
HTML::queue_global_task([&] {
do_some_work();
is_done = true;
});
HTML::main_thread_event_loop().spin_until([&] {
return is_done;
});
}
In this case, we know that the lambda passed to queue_global_task will
be executed before the function returns, and will not persist
afterwards. To avoid this warning, annotate the type of the capture
with IGNORE_USE_IN_ESCAPING_LAMBDA:
void do_something()
{
IGNORE_USE_IN_ESCAPING_LAMBDA bool is_done = false;
// ...
}
All string types currently have to invoke this function as:
stream.write_until_depleted("foo"sv.bytes());
This isn't very ergonomic, but more importantly, this overload will
allow String/ByteString instances to be written in this manner once
e.g. `ByteString::view() &&` is deleted.
Rather than making a copy of the held string, this returns a reference
so that expressions like the following:
do_something(json.as_string().view());
are not disallowed once `ByteString::view() &&` is deleted.
JPEG2000 is the last image format used in PDF filters that we
don't have a loader for. Let's change that.
This adds all the scaffolding, but no actual implementation yet.
Although it has some interesting properties, SipHash is brutally slow
compared to our previous hash function. Since its introduction, it has
been highly visible in every profile of doing anything interesting with
LibJS or LibWeb.
By switching back, we gain a 10x speedup for 32-bit hashes, and "only"
a 3x speedup for 64-bit hashes.
This comes out to roughly 1.10x faster HashTable insertion, and roughly
2.25x faster HashTable lookup. Hashing is no longer at the top of
profiles and everything runs measurably faster.
For security-sensitive hash tables with user-controlled inputs, we can
opt into SipHash selectively on a case-by-case basis. The vast majority
of our uses don't fit that description though.
This was added in commit f2663f477f as a
partial implementation of what is now LibWeb's forgiving Base64 decoder.
All use cases within LibWeb that require whitespace skipping now use
that implementation instead.
Removing this feature from AK allows us to know the exact output size of
a decoded Base64 string. We can still trim whitespace at the start and
end of the input though; for example, this is useful when reading from a
file that may have a newline at the end of the file.
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
If we already have a FlyString instantiated for the given string,
look that up and return it instead of making a temporary String just to
use as a key into the FlyString table.
Before this change, the global FlyString table looked like this:
HashMap<StringView, Detail::StringBase>
After this change, we have:
HashTable<Detail::StringData const*, FlyStringTableHashTraits>
The custom hash traits are used to extract the stored hash from
StringData which avoids having to rehash the StringView repeatedly like
we did before.
This necessitated a handful of smaller changes to make it work.
There's no need to copy the result. We can also avoid increasing the
size of the output buffer by 1 for each written byte.
This reduces the runtime of `./bin/base64 -d enwik8.base64 >/dev/null`
from 0.917s to 0.632s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
We don't really need the features provided by StringBuilder here, since
we know the exact size of the output. Avoiding StringBuilder avoids the
recurring capacity/size checks both within StringBuilder itself and its
internal ByteBuffer.
This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
0.976s to 0.428s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
We know we are only appending ASCII characters to the StringBuilder, so
do not bother validating the result.
This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
1.192s to 0.976s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.
The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
This was copying the vector behind our backs, let's remove it and make
the copying explicit by putting it behind COWVector::mutable_at().
This is a further 64% performance improvement on Wasm validation.
Otherwise, we percent-encode negative signed chars incorrectly. For
example, https://www.strava.com/login contains the following hidden
<input> field:
<input name="utf8" type="hidden" value="✓" />
On submitting the form, we would percent-encode that field as:
utf8=%-1E%-64%-6D
Which would cause us to receive an HTTP 500 response. We now properly
percent-encode that field as:
utf8=%E2%9C%93
And can login to Strava :^)
We already have a helper to split a StringView by line while considering
"\n", "\r", and "\r\n". Add an analagous method to just count the number
of lines in the same manner.