This is a fetching AO and is only used by LibWeb in the context of fetch
tasks. Move it to LibWeb with other fetch methods.
The main reason for this is that it requires the use of other LibWeb AOs
such as the forgiving Base64 decoder and MIME sniffing. These AOs aren't
available within LibURL.
The HTMLMediaElement, for example, contains spec text which states any
ongoing fetch process must be "stopped". The spec does not indicate how
to do this, so our implementation is rather ad-hoc.
Our current implementation may cause a crash in places that assume one
of the fetch algorithms that we set to null is *not* null. For example:
if (fetch_params.process_response) {
queue_fetch_task([]() {
fetch_params.process_response();
};
}
If the fetch process is stopped after queuing the fetch task, but not
before the fetch task is run, we will crash when running this fetch
algorithm.
We now track queued fetch tasks on the fetch controller. When the fetch
process is stopped, we cancel any such pending task.
It is a little bit awkward maintaining a fetch task ID. Ideally, we
could use the underlying task ID throughout. But we do not have access
to the underlying task nor its ID when the task is running, at which
point we need some ID to remove from the pending task list.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
We previously used an empty optional to denote that a ReferrerPolicy is
in the default empty string state. However, later additions added an
explicit EmptyString state. This patch moves all users to the explicit
state, and stops using `Optional<ReferrerPolicy>` everywhere except for
when an option not being passed from JavaScript has meaning.
Along with putting functions in the URL namespace into a DOMURL
namespace.
This is done as LibWeb is in an awkward situation where it needs
two URL classes. AK::URL is the general purpose URL class which
is all that is needed in 95% of cases. URL in the Web namespace
is needed predominantly for interfacing with the javascript
interfaces.
Because of two URLs in the same namespace, AK::URL has had to be
used throughout LibWeb. If we move AK::URL into a URL namespace,
this becomes more painful - where ::URL::URL is required to
specify the constructor (and something like
::URL::create_with_url_or_path in other places).
To fix this problem - rename the class in LibWeb implementing the
URL IDL interface to DOMURL, along with moving the other Web URL
related classes into this DOMURL folder.
One could argue that this name also makes the situation a little
more clear in LibWeb for why these two URL classes need be used
in the first place.
The resource:// scheme is used for Core::Resource files. Currently, any
users of resource:// URLs in Ladybird must manually create the Resource
and extract its data. This will allow for passing the resource:// URL
along for LibWeb to handle.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This is a hack on top of a hack because Workers don't *really* need to
have a Web::Page at all, but the ResourceLoader infra that should be
going away soon ™️ is not quite ready to axe that requirement for
cookies.
In FetchAlgorithms, it is common for callbacks to capture realms. This
can indirectly keep objects alive that hold FetchController with these
callbacks. This creates a cyclic dependency. However, when
JS::HeapFunction is used, this is not a problem, as captured by
callbacks values do not create new roots.
Stop worrying about tiny OOMs. Work towards #20449.
While going through these, I also changed the function signature in many
places where returning ThrowCompletionOr<T> is no longer necessary.
This adds a simple and incomplete implementation for extracting some
specific CORS headers that are used by fetch. This unifies the existing
ad-hoc parsing that already existed for Access-Control-Allow-Headers
and Access-Control-Allow-Methods, as well as adding
Access-control-Expose-Headers.
This adds the headers named in Access-Control-Expose-Headers to the
response's CORS-exposed header-name list which allows those headers to
be accessed from JS.
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.
Making that change resulted in a whole bunch of fallout.
After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
Specifically, this makes `<link>` elements with an `integrity` attribute
actually work. Previously, we would load their resource, and then drop
it on the floor without actually using it.
The Subresource Integrity code is in `LibWeb/SRI`, since SRI is the name
of the recommendation spec: https://www.w3.org/TR/SRI/
However, the Fetch spec links to the editor's draft, which varies
significantly from the recommendation, and so that is what the code is
based on and what the spec comments link to:
https://w3c.github.io/webappsec-subresource-integrity/Fixes#18408
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.
The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
If we fail to set the response type to an error, calling code will think
the fetch was successful. We also should not default to an error code of
200, which would also indicate success.
This builds on the existing ad-hoc ResourceLoader code for HTTP fetches
which works for files as well.
This also includes a test that checks that stylesheets loaded with the
"file" URL scheme actually work.
The condition for checking if there was already a response in recursive
fetch was accidentally flipped, always causing a null deref.
This made redirects crash for example.
This makes Fetch rely less on using main_thread_vm().current_realm(),
which relies on the dummy execution context if no JavaScript is
currently running.
The majority of error strings are StringView literals, and it seems
silly to require heap-allocating strings for these.
This idea is stolen from a recent change in fd1fbad :^)