This makes it possible to use MakeIndexSequqnce in functions like:
template<typename T, size_t N>
constexpr auto foo(T (&a)[N])
This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.
(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
Instead of polluting global namespace with definitions from
libkern/OSByteOrder.h and machine/endian.h on MacOS, just use AK
functions for conversions.
On argument swapping to put positional ones toward the end,
m_arg_index was pointing at "last arg index" + "skipped args" +
"consumed args" and thus was pointing ahead of the skipped ones.
m_arg_index now points after the current parsed option arguments.
Now it actually only exposes methods to allocate uninitialized storage
and to create substring with a shared superstring. All the details of
the memory layout are fully encapsulated.
The idea is to eventually get rid of protected state in StringBase. To
do this, we first need to remove all references to m_data and
m_short_string from String.
This starts separating memory management of string data and string
utilities like `String::formatted`. This would also allow to reuse the
same storage in `DeprecatedString` in the future.
This method (unlike can_read_line) ensures that the delimiter is present
in the buffer, and doesn't return true after eof when the delimiter is
absent.
`JsonValue::to_byte_string` has peculiar type-erasure semantics which is
not usually intended. Unfortunately, it also has a very stereotypical
name which does not warn about unexpected behavior. So let's prefix it
with `deprecated_` to make new code use `as_string` if it just wants to
get string value or `serialized<StringBuilder>` if it needs to do proper
serialization.
A bunch of users used consume_specific with a constant ByteString
literal, which can be replaced by an allocation-free StringView literal.
The generic consume_while overload gains a requires clause so that
consume_specific("abc") causes a more understandable and actionable
error.
The current algorithm is currently O(N^2) because we forward-search an
ever-increasing substring of the haystack. This implementation reduces
the search time of a 500,000-length string (where the desired needle is
at index 0) from 72 seconds to 2-3 milliseconds.
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
Using a vector to represent a list of painting commands results in many
reallocations, especially on pages with a lot of content.
This change addresses it by introducing a SegmentedVector, which allows
fast appending by representing a list as a sequence of fixed-size
vectors. Currently, this new data structure supports only the
operations used in RecordingPainter, which are appending and iterating.
Much of the UTF-8 data that we'll iterate over will be ASCII only,
and we can get a significant speed-up by simply having a fast path
when the iterator points at a byte that is obviously an ASCII character
(<= 0x7F).
Since we're already building up a percent-encoded ASCII-only string
in the internal parser buffer, there's no need to do a second UTF-8
validation pass before assigning each part of the parsed URL.
This makes URL parsing signficantly faster.
Instead of do a wrappy MUST(try_append_code_point()), we now inline
the UTF-8 encoding logic. This allows us to grow the buffer by the
right increment up front, and also removes a bunch of ErrorOr ceremony
that we don't care about.
Once we know that a code point must be a valid ASCII character,
we now cast it to `char` and avoid the expensive generic
StringView::contains(u32 code_point) checks.
This dramatically speeds up URL parsing.
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.
This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.
This makes the following microbenchmark go 4x faster:
"foo".repeat(100_000_000)
And for single character strings, we can even go 10x faster:
"x".repeat(100_000_000)
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
Half the functions used are not readily available on windows, instead of
creating more ifdef soup, this commit simply disables the rich debug
stuff on windows.
If we don't have __builtin_add_overflow_p(), we can also try using
__builtin_add_overflow(). This makes debug builds with Clang
significantly faster since they no longer need to use the generic
implementation. Same for multiplication.
Previously, `replace` used `find_all` to find all of the positions to
replace. But `find_all` finds all the *overlapping* instances of the
needle, while `replace` assumed that the next position was always at
least `needle.length()` away from the last one. This led to crashes like
https://github.com/SerenityOS/jakt/issues/1159.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This requires duplicating some logic from Core::Process::get_name()
into AK, which seems unfortunate. But for now, this greatly improves the
log messages for testing Ladybird on Linux.
The feature is hidden behind a runtime flag with a global setter in the
same way that totally enabling/disabling dbgln is.
This tries to optimize the refill code by making it easier to digest for
the branch predictor. This includes not looping as much across function
calls and marking our EOF case to be unlikely.
Co-Authored-By: Lucas Chollet <lucas.chollet@free.fr>
This is an extension of cc0b970d but for the reference-handling
specialization of Optional.
This basically allow us to write code like:
```cpp
Optional<u8&> opt;
opt = OptionalNone{};
```
Previously, we were returning an empty optional if key contained a
numerical value which was not stored as double. Stop doing that and
rename the method to signify the change in the behavior.
Apparently, this fixes bug in an InspectorWidget in Ladybird on
Serenity: it showed 0 for element's boxes with integer sizes.
ASAN was crying way too much when running the LibJS JIT since the old
OFFSET_OF implementation was too wild for its liking.
By turning off the invalid-offsetof warnings, we can use the offsetof
builtin instead. However, I'm leaving this as a wrapper macro, since
we may still want to do something different for other compilers.
The streams and other common APIs require byte spans to operate on
arbitrary data. This is less than helpful when wanting to serialize
spans of other data types, such as from an Array or Vector of u32s.
There were two issues:
1) the C+=R and C-=R operators expected arithmetic types to have .real()
2) the R+C, R-C, R*C and R/C operators applied the operation in wrong
order (did C+R, C-R, C*R and C/R instead). This wouldn't matter for
+ and * which are commutative, but is incorrect for - and /.
This is a bit spammy now that we are performing some overload resolution
at build time. The fallback to an interface has generally worked fine on
the types it warns about (BufferSource, Module, etc.) so let's not warn
about it for every build.
Let's replace this bool with an `enum class` in order to enhance
readability. This is done by repurposing `MappedFile`'s `OpenMode` into
a shared `enum` simply called `Mode`.
Previously, `URLParser` was constructing a new String for every
character of the URL's username and password. This change improves
performance by eliminating those unnecessary String allocations.
A URL with a 100,000 character password can now be parsed in ~30ms vs
~8 seconds previously on my machine.
I added some spec comments, and implementation notices, this should not
change behavior in a significant way.
The previous code was quite unwieldy and repetitive.
The long `if(next_is('X'))` chain is now a smaller `switch`.
I also reinstated the fast path for long sequences of literal
characters, which was broken in 0aad21fff2
Consider the following:
JsonValue value { JsonValue::Type::Object };
value.as_object().set("foo"sv, "bar"sv);
The JsonValue(Type) constructor does not initialize the underlying union
that stores its value. Thus JsonValue::as_object() will A) refer to an
uninitialized union member, B) deference that member.
This constructor only has 2 users, both of which initialize the type to
Type::Null. Rather than implementing unused functionality here, replace
those uses with the default JsonValue constructor, and remove the faulty
constructor.
The fun pattern of `union { struct { u32 a : 1; u64 b : 7; }; u8 x; }`
produces complete garbage on windows, this commit fixes the two
instances of those that exist in AK.
This commit also makes sure that the generated unions have the correct
size (whereas FloatExtractor<f32> previously did not!) by adding some
nice static_asserts.
Previously we assumed a default precision of 6, which made the printed
values quite odd in some cases.
This commit changes that default to print them with just enough
precision to produce the exact same float when roundtripped.
This commit adds some new tests that assert exact format outputs, which
have to be modified if we decide to change the default behaviour.
The slugify function is used to convert input into URL-friendly slugs.
It processes each character in the input, keeping ascii alpha characters
after lowercase and replacing non-alphanum characters with the glue
character or a space if multiple spaces are encountered consecutively.
The resulting string is trimmed of leading and trailing whitespace, and
any internal whitespace is replaced with the glue character.
It is currently used in LibMarkdown headings generation code.
The backtrace execinfo API takes the number of addresses the result
buffer can hold instead of its size, for some reason. Previously
backtraces larger than 256 frames deep would write past the end of the
result buffer.
Instead of implementing this inline, put it into a function. Use this
new function to correctly implement shortening paths for some places
where this logic was previously missing.
Before these changes, the pathname for the included test was incorrectly
being set to '/' as we were not considering the windows drive letter.
Fixed the issue in StringUtils::convert_to_floating_point() where the
end pointer of the trimmed string was not being passed, causing the
function to consistently return 'None' when given strings with trailing
whitespaces.
There were 2 issues with the way we formatted floating point decimals:
if the part after the decimal point exceeded the max of an u64 we would
generate wildly incorrect decimals, and we applied no rounding.
With this new code, we emit decimals one by one and perform a simple
reverse string walk to round the number up if required.
Prior to this commit, constructing a DS from a null DFS would cause a
nullptr deref, which broke (at least) Profiler.
This commit converts the null DFS to an empty DS, avoiding the nullptr
deref (until DFS loses its null state, or we decide to not make it
convertible to a DS).
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>
Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
This function was an artifact from when we were using DeprecatedString
much more in URL class before porting to Optional<String>, where we no
longer rely on the null state of DeprecatedString.
The original doc comment was mistakenly copy-pasted from
count_leading_zeroes_safe, and incorrect. The function is doing
something else: it's counting _trailing_ zeroes instead of _leading_
ones.
The mentioned functions used m_size / 8 instead of size_in_bytes()
(division with ceiling rounding mode), which resulted in an off-by-one
error such that the functions didn't search in the last not-fully-8-bits
byte.
Using size_in_bytes() instead of m_size / 8 fixes this.
When working with FixedMemoryStreams, and especially MappedFiles, you
may don't want to copy the underlying data when you read from the
stream. Pointing into that data is perfectly fine as long as you know
the lifetime of it is long enough.
This commit adds a couple of methods for reading either a single value,
or a span of them, in this way. As noted, for single values you sadly
get a raw pointer instead of a reference, but that's the only option
right now.
Previously, the first match index was not checked to see if the camel
case or separator bonuses applied. The camel case bonus could also be
incorrectly applied where strings had non-alphabetical characters.
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
Instead of ballooning the size of the Function object, simply place the
callable on the heap with a properly aligned address.
This brings the alignment of Function down from ridiculous sizes like 64
bytes down to a manageable 8 bytes.
…kernel"
This reverts commit d4d92184b3.
The alignemnt requirements imposed by this are overkill at best and
ridiculous at worst, a future commit will tackle this problem in a
different, more space-efficient way.
This commit also reverts db5ad0c since code outside of the web spec
expects serialized paths to be percent decoded.
Also, there are issues trying to implement the concept "opaque
path". For now, we still use the old cannot_be_a_base_url(), but its
usage needs to be removed in favor of a has_opaque_path() as the spec
has changed since then.
...requested size"
This reverts commit 13573a6c4b.
Some clients of `BufferedStream` expect a non-blocking read by
`read_some` which the commit above made impossible by potentially
performing a blocking read. For example, the following command hangs:
pro http://ipecho.net/plain
This is caused by the HTTP job expecting to read the body of the
response which is included in the active buffer, but since the buffered
data size is less than the buffer passed into `read_some`, another
blocking read is performed before anything is returned.
By reverting this commit, all tests still pass and `pro` no longer
hangs. Note that because of another bug related to `stdout` / `stderr`
mixing and the absence of a line ending, there is no output to the
command above except a progress update.
This is defined when building on GNU/Hurd, the GNU operating system with
the Hurd as its kernel (as it was designed originally, before Linux and
GNU/Linux came to be).
Also, define the corresponding part of User-Agent.
This is defined when we're compiling against the GNU C Library, whether
on Linux or on other kernels that glibc works on. More AK_LIBC_xxxx
definitions could be potentially added in the future.
Also add AK_LIBC_GLIBC_PREREQ(), which checks for a specific glibc
version.
While macOS backtrace(3) puts a space directly after the mangled symbol
name, some versions of glibc put a + directly after it. This new logic
accounts for both situations when trying to demangle.
Co-Authored-By: Andreas Kling <kling@serenityos.org>
On platforms that support it, enable using ``<execinfo.h>`` to get
backtrace(3) to dump a backtrace on assertion failure. This should make
debugging things like WebContent crashes in Lagom much easier.
This part of the spec is mostly useful for our debugging for now, but
could eventually be hooked up so that the user can see any reported
validation errors.
The main change here is that we now follow the spec by asserting the URL
is not special. Previously I could not find a way to enable this without
getting assertions when browsing the web - but that seems to no longer
be the case (probably from some other fixes which have since been made).
The main change is the simplification of the expression
`(10^precision * fraction) / 2^precision` to `5^precision * fraction`.
Those expressions overflow or not depends on the value of `precision`
and `fraction`. For the maximum value of `fraction`, the following table
shows for which value of `precision` overflow will occur.
Old New
u32 08 10
u64 15 20
u128 30 39
As of now `u64` type is used to calculate the result of the expression.
Meaning that before, only FixedPoints with `precision` less than 15
could be accurately rendered (for every value of fraction) in decimal.
Now, this limit gets increased to 20.
This refactor also fixes, broken decimal render for explicitly specified
precision width in format string, and broken hexadecimal render.
By default, `1` is of the type `int` which is 32-bits wide at max.
Because of that, if `precision` of a `FixedPoint` is greater than 32,
the expression `1 << precision` will get clamped at 32-bits and the
result will always be zero. Casting `1` to the wider underlying type
will make the expression not overflow.
Because of the off-by-one error, the second bit of the fraction was
getting ignored in differentiating between fractions equal to 0.5 or
greater than 0.5. This resulted in numbers like 2.75 being considered
as having fraction equal to 0.5 and getting rounded incorrectly (to 2).
This new Kernel StdLib function will be used to copy contents of a
FixedStringBuffer with a null character to a user process.
The first user of this new function is the prctl option of
PR_GET_PROCESS_NAME which would copy a process name including a null
character to a user provided buffer.
There was a small mishmash of argument order, as seen on the table:
| Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
============= | ======================= | =======================
uses equals() | HashMap | Vector, HashTable
defines equals() | *String[^1] | ByteBuffer
[^1]: String, DeprecatedString, their Fly-type equivalents and KString.
This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.
I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.
With this change, each of the following lines will now compile
successfully:
Vector<String>().contains_slow("WHF!"sv);
HashTable<String>().contains("WHF!"sv);
HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
The proper syntax for defining user-defined literals does not require a
space between the `operator""` token and the operator name:
> error: identifier 'sv' preceded by whitespace in a literal operator
> declaration is deprecated
This change introduces HeapFunction, which is intended to be used as a
replacement for SafeFunction. The new type behaves like a regular
GC-allocated object, which means it needs to be visited from
visit_edges, and unlike SafeFunction, it does not create new roots for
captured parameters.
Co-Authored-By: Andreas Kling <kling@serenityos.org>
IFF was a generic container fileformat that was popular on the Amiga
since it was the only file format supported by Deluxe Paint.
ILBM is an image format popular in the late eighties/nineties
that uses the IFF container.
This is a very first version of the decoder that only supports
(byterun) compressed files with bpp <= 8.
Only the minimal chunks are decoded: CMAP, BODY, BMHD.
I am planning to add support for the following variants:
- EHB (32 colours + lighter 32 colours)
- HAM6 / HAM8 (special mode that allowed to display the whole Amiga
4096 colours / 262 144 colours palette)
- TrueColor (24bit)
Things that could be fun to do:
- Still images could be animated using color cycle information
The web specs do not expect decoding or decoding to happen when calling
these helpers. This allows us to remove the raw_fragment helper function
from the URL class.
This encoder can handle all integer formats and sample rates, though
only two channels well. It uses fixed LPC and performs a
close-to-optimal parameter search on the LPC order and residual Rice
parameter, leading to decent compression already.
Just like with input buffered streams, we don't currently have a use
case for output buffered streams which aren't seekable, since the main
application are files.
To ensure this happens without duplicating code, we allow forcing a
StringBuilder object to only use the inline buffer, so the code in the
AK/Format.cpp file doesn't need to deal with different underlying
storage types (expandable or inline-fixed) at all.
I couldn't run the parser in a debugger like I normally would, so I
added printouts to understand where the parser is failing.
More could be added, but these are enough to get a good idea of what
the parser is doing. It's very spammy, though, so enable it by flicking
the IMAP_PARSER_DEBUG switch :^)
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.
The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
This class encapsulates a fixed Array with compile-time size definition
for storing ASCII characters.
There are also new Kernel StdLib functions to copy user data into such
objects so this class will be useful later on.
The spec indicates we should support serializing opaque hosts, but we
were assuming the host contained a String. Opaque hosts are represented
with Empty. Return an empty string here instead to prevent crashing on
an invalid variant access.
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
This could happen if a sequence of '0' parts was followed by a longer
sequence of '0' parts at the end of the host. The first sequence was
being used for the compress, and not the second.
For example, [1:1:0:0:1:0:0:0] was being serialized as: [1:1::1:0:0:0]
instead of [1:1:0:0:1::].
Fix this by checking at the end of the loop if we are in the middle of a
sequence of '0' parts that is longer than the current longest.
Everywhere only ever expects percent encoding to occur, so let's just
remove this flag altogether. At the same time, replace some
DeprecatedString with StringView.
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').
This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.
data: URLs didn't have this problem, because we have a special case for
parsing them.
This is defined in the spec, but was missing in our table. Fix this, and
add a spec comment for what is missing. Also begin a basic text based
test for URL, so we can get some coverage of LibWeb's usage of URL too.
This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
- a
| - b
| | - c
| | | - <accept>
| | - <accept>
| - c
| | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.
This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
Similar to floor and ceil, we were forcing the values to be doubles here
Also adds a big FIXME about my findings trying to add a general
implementation for these