Web specs do not return through javascript percent decoded URL path
components - but we were doing this in a number of places due to the
default behaviour of URL::serialize_path.
Since percent encoded URL paths may not contain valid UTF-8 - this was
resulting in us crashing in these places.
For example - on an HTMLAnchorElement when retrieving the pathname for
the URL of:
http://ladybird.org/foo%C2%91%91
To fix this make the URL class only return the percent encoded
serialized path, matching the URL spec. When the decoded path is
required instead explicitly call URL::percent_decode.
This fixes a crash running WPT URL tests for the anchor element on:
https://wpt.live/url/a-element.html
This commit replaces all TLS connection code with wolfssl.
The certificate parsing code has to remain for now, as wolfssl does not
seem to have any exposed API for that.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.
Making that change resulted in a whole bunch of fallout.
After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.
The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
This adds support for WebSocket subprotocols to WebSocket DOM
objects, with some necessary plumbing to LibWebSocket and its
clients.
See the associated pull request for how this was tested.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Frames with large payloads may arrive in multiple chunks, so it's not
safe to assume that the whole frame is available for reading just
because we got a first "ready to read" notification.
This patch solves this in a very naive way by simply buffering incoming
frame data and trying to reparse a frame every time new data arrives.
This is definitely inefficient, but it works as a start.
With this, it's now possible to log in to Discord in Ladybird! :^)
The LibCore sockets implementation becomes WebSocketImplSerenity.
This will allow us to create a custom WebSocketImpl subclass for Qt
to use in Ladybird.
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
URL had properly named replacements for protocol(), set_protocol() and
create_with_file_protocol() already. This patch removes these function
and updates all call sites to use the functions named according to the
specification.
See https://url.spec.whatwg.org/#concept-url-scheme
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Similar reasoning to making Core::Stream::read() return Bytes, except
that every user of read_line() creates a StringView from the result, so
let's just return one right away.
A mistake I've repeatedly made is along these lines:
```c++
auto nread = TRY(source_file->read(buffer));
TRY(destination_file->write(buffer));
```
It's a little clunky to have to create a Bytes or StringView from the
buffer's data pointer and the nread, and easy to forget and just use
the buffer. So, this patch changes the read() function to return a
Bytes of the data that were just read.
The other read_foo() methods will be modified in the same way in
subsequent commits.
Fixes#13687
read_length in WebSocket::read_frame is used to track how many bytes we
have read in from the network. However, we was subtracting the number
of read in bytes instead of adding, underflowing it to about the 64-bit
unsigned integer limit.
This effectively limited us to only doing one read from the network.
This was only an issue if the server stalled when sending data,
which is especially common for large payloads. This would also cause us
to go out of sync. This meant when a new frame came in, we would read
the payload data of the previous frame as if it was the frame header
and payload of the next frame.
This allows us to read in the initial payload from Discord Gateway
that describes to the client the servers we are in, the emotes the
server has, the channels it has, etc. For an account that's only in
the Serenity Discord, this was about 20 KB (compressed!)
According to RFC 6455 sections 5.5.2-5.5.3 Ping and Pong frames
can have empty "Application data" that means payload can be of size 0.
This change fixes failed "buffer.size()" assertion inside
of Core::Stream::write_or_error by not trying to send empty payload
in WebSocket::send_frame.
As LibTLS now supports the Core::Stream APIs, we can get rid of the
split paths for TCP/TLS and significantly simplify the code as well.
Provided to you free of charge by the Core::Stream-ification team :^)
This commit converts TLS::TLSv12 to a Core::Stream object, and in the
process allows TLS to now wrap other Core::Stream::Socket objects.
As a large part of LibHTTP and LibGemini depend on LibTLS's interface,
this also converts those to support Core::Stream, which leads to a
simplification of LibHTTP (as there's no need to care about the
underlying socket type anymore).
Note that RequestServer now controls the TLS socket options, which is a
better place anyway, as RS is the first receiver of the user-requested
options (though this is currently not particularly useful).
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
Derivatives of Core::Object should be constructed through
ClassName::construct(), to avoid handling ref-counted objects with
refcount zero. Fixing the visibility means that misuses like this are
more difficult.
The callback should be called as soon as the connection is established,
and if we actually set the callback when it already is, we expect it to
be called immediately.
We don't want to destroy the WebSocketImpl while we're still using it
higher up the stack. By using deferred_invoke(), we allow the stack
to unwind before actually destroying any objects.
This fixes an issue with the WebSocket service crashing on immediate
connection failure.