Commit graph

3193 commits

Author SHA1 Message Date
Karol Kosek
eb41f0144b AK: Decode data URLs to separate class (and parse like every other URL)
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.

Because parsing 'data:' didn't use standard fields, running the
following JS code:

    new URL('#a', 'data:text/plain,hello').toString()

not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.

With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
2023-08-01 14:19:05 +02:00
Karol Kosek
58017a0581 AK: Clear buffer after leaving CannotBeABaseUrlPath in URLParser
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').

This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.

data: URLs didn't have this problem, because we have a special case for
parsing them.
2023-08-01 10:10:07 +02:00
Shannon Booth
aa7ca80d7c AK: Fix missing step step for serialization of IPv6 hosts
This was resulting in the incorrect host serialization of:

http://[0:1:0:1:0:1:0:1] to [::1:0:1:0:1:0:1]

and:

http://[1:0:1:0:1:0:1:0] to [1::1:0:1:0:1:0]
2023-07-31 14:48:24 +02:00
Shannon Booth
4fdd4dd979 AK: Add missing default port definitions for FTP scheme URLs
This is defined in the spec, but was missing in our table. Fix this, and
add a spec comment for what is missing. Also begin a basic text based
test for URL, so we can get some coverage of LibWeb's usage of URL too.
2023-07-31 14:48:24 +02:00
Ali Mohammad Pur
4e69eb89e8 LibRegex: Generate a search tree when patterns would benefit from it
This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
    - a
    | - b
    | | - c
    | | | - <accept>
    | | - <accept>
    | - c
    | | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.

This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
2023-07-31 05:31:33 +02:00
Ali Mohammad Pur
7a471b7cf5 AK: Allow customising Trie's underlying map type
This makes it possible to use an ordered map and keep the insertion
order intact.
2023-07-31 05:31:33 +02:00
Hendiadyoin1
07e4358c63 AK: Use correct builtins for fmod and remainder
Similar to floor and ceil, we were forcing the values to be doubles here

Also adds a big FIXME about my findings trying to add a general
implementation for these
2023-07-31 05:22:12 +02:00
Hediadyoin1
c9808f0d4a AK: Use correct builtins for floor and ceil
We were using the double ones, forcing casts to and from them for floats
2023-07-31 05:22:12 +02:00
Hediadyoin1
29e0494e56 AK: Use builtins in fabs implementation and move it to the top
Both GCC and Clang inline this function to use bit-wise logic and/or
appropriate instructions even on -O0 and allow their use in a constexpr
context, see
https://godbolt.org/z/de1393vha
2023-07-31 05:22:12 +02:00
Hediadyoin1
594369121a AK: Add trunc and rint to AK/Math.h
These are useful in some algorithms, which require specific rounding.
2023-07-31 05:22:12 +02:00
Hediadyoin1
6573ace8f1 AK: Move rounding function to the top of AK/Math.h
These are useful in other algorithms, so lets move them up
2023-07-31 05:22:12 +02:00
Shannon Booth
8751be09f9 AK: Serialize URL hosts with 'concept-host-serializer'
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.

Making that change resulted in a whole bunch of fallout.

After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
2023-07-31 05:18:51 +02:00
Shannon Booth
768f070b86 AK: Implement 'concept-host-serializer' in URL spec
This implementation will allow us to fix serialization of IPv6
addresses not being surrounded by '[' and ']'.

Nothing is calling this function yet - this will come in the next
(larger) commit where the underlying host representation inside of
AK::URL is changed from DeprecatedString to URL::Host.
2023-07-31 05:18:51 +02:00
Shannon Booth
803ca8cc80 AK: Make serialize_ipv6_address take a StringBuilder
This will allow us to implement 'concept-host-serializer' without
needing to call String::formatted.
2023-07-31 05:18:51 +02:00
Shannon Booth
a1ae701a7d AK: Move URL::cannot_have_a_username_or_password_or_port out of line
This doesn't seem trivial enough to be defining in the header like this,
and should not be a performance critical function anyhow.

Also add spec comments while we are at it, and a FIXME since we do not
seem to exactly align.
2023-07-31 05:18:51 +02:00
Shannon Booth
0c0117fc86 AK: Add typdefs for host URL definitions
And use them where applicable. This will allow us to store the host in
the deserialized format as the spec specifies.

Ideally these typdefs would instead be the existing AK interfaces, but
in the meantime, we can just use this.
2023-07-31 05:18:51 +02:00
Andrew Kaster
f5e8bba092 AK: Add argument to LexicalPath::basename to strip the extension 2023-07-30 17:50:44 -06:00
Sam Atkins
3f7d97f098 AK+Libraries: Remove FixedMemoryStream::[readonly_]bytes()
These methods are slightly more convenient than storing the Bytes
separately. However, it it feels unsanitary to reach in and access this
data directly. Both of the users of these already have the
[Readonly]Bytes available in their constructors, and can easily avoid
using these methods, so let's remove them entirely.
2023-07-30 19:32:52 +01:00
Shannon Booth
bf7af25a82 AK: Allow testing Empty instances for equality
This also makes it possible to compare `Variant<Empty, Ts...>`
objects if operator== exists for all Ts
2023-07-28 20:47:48 +03:30
kleines Filmröllchen
a0705202ea Kernel/Ext2: Write superblock backups
We don't ever read them out, but this should make fsck a lot less mad.
2023-07-28 14:51:07 +02:00
Lucas CHOLLET
18b7ddd0b5 AK: Rename the const overload of FixedMemoryStream::bytes()
Due to overload resolutions rules, this simple code provokes a crash:

ReadonlyBytes readonly_bytes{};
FixedMemoryStream stream{readonly_bytes};
ReadonlyBytes give_them_back{stream.bytes()};
    // -> Panics on VERIFY(m_writing_enabled);
    // but this is fine:
auto bytes = static_cast<FixedMemoryStream const&>(*stream).bytes()

If we need to be explicit about it, let's rename the overload instead of
adding that `static_cast`.
2023-07-27 14:40:00 +01:00
Shannon Booth
7b3902e3d5 AK: Remove unused URL::scheme_requires_port
THis function does not seem to be used anywhere, and I cannot find any
spec equivalent for this function.
2023-07-25 06:43:50 -04:00
Shannon Booth
c8da880806 AK: Add spec comments to URL::serialize 2023-07-25 06:43:50 -04:00
Shannon Booth
177b04dcfc AK: Fix url host parsing check for 'ends in a number'
I misunderstood the spec step for checking whether the host 'ends with a
number'. We can't simply check for it if ends with a number, this check
is actually an algorithm which is required to avoid detecting hosts that
end with a number from an IPv4 host.

Implement this missing step, and add a test to cover this.
2023-07-25 06:43:50 -04:00
Aliaksandr Kalenik
d216621d2a AK: Add clamp_to_int(value) in Math.h
clamp_to_int clamps value to valid range of int values so resulting
value does not overflow.

It is going to be used to clamp float or double values to int that
represents fixed-point value of CSSPixels.
2023-07-25 11:52:02 +02:00
Shannon Booth
8d2ccf0f4f AK: Implement IPV4 host URL parsing to specification
This implements both the parsing and serialization IPV4 parts from
the URL spec.
2023-07-24 17:07:16 -04:00
Shannon Booth
50359567e0 AK: Add spec comments for URL spec defined member variables 2023-07-24 17:07:16 -04:00
Timothy Flynn
685c8c3d40 Revert "AK: Automatically copy all warn/warnln logs to debug console"
This reverts commit d48c68cf3f.

Unfortunately, this currently copies some warn() invocations that we do
*not* want in the debug console, such as test-js's use of OSC command 9
to report progress.
2023-07-22 12:19:53 -04:00
Lucas CHOLLET
f79165cefe AK: Make FixedArray movable 2023-07-21 10:47:34 -06:00
Andrew Kaster
3533d3e452 AK: Enable consteval workaround for Android NDK
Android isn't shipping clang-15 yet in any NDK, so use the existing
workaround on that platform.
2023-07-19 04:22:28 -06:00
Andreas Kling
f0ec104131 AK: Implement IPv6 host parsing in URLParser
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.

Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
2023-07-17 07:47:58 +02:00
Sam Atkins
d48c68cf3f AK: Automatically copy all warn/warnln logs to debug console
This is only enabled inside Serenity, as on Lagom, all out/warn/dbg logs
go to the same console anyway.
2023-07-16 00:59:13 +02:00
Shannon Booth
5625ca5cb9 AK: Rename URLParser::parse to URLParser::basic_parse
To make it more clear that this function implements
'concept-basic-url-parser' instead of 'concept-url-parser'.
2023-07-15 09:45:16 +02:00
Shannon Booth
7ef4689383 AK: Implement steps for state override in URL parser 2023-07-15 09:45:16 +02:00
Daniel Bertalan
aaf1b762ea AK: Remove redundant information from TypeErasedFormatParams
The array which contains the actual parameters is always located
immediately after the base `TypeErasedFormatParams` object of
`VariadicFormatParams`. Hence, storing a pointer to it inside a `Span`
is redundant. Changing it to a zero-length array saves 8 bytes.

Secondly, we limit the number of parameters to 256, so `m_size` and
`m_next_index` can be stored in a smaller data type than `size_t`,
saving us another 8 bytes.

This decreases the size of a single-element `VariadicFormatParams` from
48 to 32 bytes, thus reducing the code size overhead of setting up
parameters for `dbgln()`.

Note that [arrays of length zero][1] are a GNU extension, but it's used
elsewhere in the codebase already and is explicitly supported by Clang
and GCC.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
2023-07-14 06:37:11 +02:00
Nico Weber
9fb0de1cfe AK: Mark Error nodiscard
...instead of manually marking all methods returning Error nodiscard.

No real behavior change.
2023-07-12 17:03:07 +02:00
Daniel Bertalan
cfadbcd950 AK: Work around Xcode 15 beta mishandling trailing requires clauses
Xcode 15 betas 1-3 lack https://reviews.llvm.org/D135772, which fixes a
bug that causes trailing `requires` clauses to be evaluated twice,
failing the second time. Reported as FB12284201.

This caused compile errors when instantiating types derived from RefPtr:
> error: invalid reference to function 'NonnullRefPtr': constraints not
> satisfied
> note: because substituted constraint expression is ill-formed: value
> of type '<dependent type>' is not contextually convertible to 'bool'.

This commit works around the issue by moving the `requires` clauses
after the template parameter list.

In most cases, trailing `requires` clauses and those specified after the
template parameter list work identically, so this change should not
impact the code's behavior. The only difference is that trailing
requires clauses are evaluated *after* constrained placeholder types
(i.e. `Integral auto i` function parameter).
2023-07-12 15:43:18 +01:00
Lucas CHOLLET
398f7ae988 AK: Move chunks a single time in cleanup_unused_chunks()
All elements of the vector were moved to the left, for each element to
remove. This patch makes the function move each element exactly once.

On the same test case as the previous commit, it makes the function
disappear from the profile. These two commits combined reduce the
decompression time by 12%.
2023-07-10 21:35:10 -04:00
Lucas CHOLLET
44bedf7844 AK: Don't reuse chunks in AllocatingMemoryStream
As confusing as it may sound, reusing them is terrible performance wise.
When profiling the PNG decoder, the result (which is dominated by the
Zlib decompression) shows that the `cleanup_unused_chunks()` function
represented 14.26% of the profile before this patch and only 7.7%
afterward.

On a 6.5 MB PNG image, it reduces the decompression time by more than
5%.
2023-07-10 21:35:10 -04:00
Tim Schumacher
8a9761de20 AK: Let Array::from_span take a readonly Span
We are copying here, so there is no need to require a non-const Span.
2023-07-09 15:40:41 +01:00
Hendiadyoin1
467bc5b093 AK: Add a generic version of log2
This uses one of Sun OS's algorithms, for a comparison to other
algorithms please refer to
https://gist.github.com/Hendiadyoin1/f58346d66637deb9156ef360aa158bf9

This is used on aarch64 builds and for x86 floats and doubles
for performance gains check
https://quick-bench.com/q/_2jTykshP6cUqtgdepFaoQ53YC8
which shows approximately 2x gains

Co-Authored-By: Ben Wiederhake <BenWiederhake.GitHub@gmx.de>
Co-Authored-By: kleines Filmröllchen <filmroellchen@serenityos.org>
Co-Authored-By: Dan Klishch <danilklishch@gmail.com>
2023-07-09 15:39:52 +01:00
Ali Mohammad Pur
5a0ad6812c AK: Add a CallableAs<R, Args...> concept
This is just the concept version of the IsCallableWithArguments type
trait previously defined in Function.h.
2023-07-08 23:13:00 +01:00
Timothy Flynn
996c020b0d Everywhere: Remove 'clang-format off' comments that are no longer needed 2023-07-08 10:32:56 +01:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Timothy Flynn
aff81d318b Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-16 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.h")
2023-07-08 10:32:56 +01:00
Tim Schumacher
60ac254df6 AK: Use hashing to accelerate searching a CircularBuffer 2023-07-06 15:06:20 +01:00
Tim Schumacher
4a10cf1506 AK: Make CircularBuffer::read_with_seekback const
Compared to the other read and write functions, this doesn't modify the
internal state of the circular buffer.
2023-07-06 15:06:20 +01:00
Tim Schumacher
42d01b21d8 AK: Rewrite the hint-based CircularBuffer::find_copy_in_seekback
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
2023-07-06 15:06:20 +01:00
Tim Schumacher
3526d67694 AK: Add Span<T>::matching_prefix_length 2023-07-06 15:06:20 +01:00
Tim Schumacher
2109f61b0d AK: Add search-related helpers to CircularBuffer
This factors out a lot of complicated math into somewhat understandable
functions.

While at it, rename `next_read_span_with_seekback` to
`next_seekback_span` to keep the naming consistent and to avoid making
function names any longer.
2023-07-06 15:06:20 +01:00