We were neglecting to check the namespace when looking for a specific
type of element on the stack of open elements in many cases.
This caused us to confuse HTML and SVG elements.
Element::tag_name() returns an uppercased string for HTML elements,
which is usually not what's expected by the parser algorithms that look
at tag names.
It's actually possible for there to be no adjusted current node, when
the stack of open elements is empty. This was covered by one of the WPT
parsing tests.
If we reach the end of the token stream when "ignoring and moving on
to the next token" in the "in body" state, we should just not move
on to the next token, since there isn't one.
Covered by various WPT HTML parsing tests that will be imported in
a subsequent commit.
This fixes an issue where document.write() with only text input would
leave all the character data as unflushed text in the parser.
This fixes many of the WPT tests for document.write().
In particular, there was an assertion failure due to the temporary
parser document's "about base URL" being empty when trying to "parse a
URL" during parsing.
We fix this by copying the context element's document's about base URL
to the temporary parsing document while parsing a fragment.
This fixes a crash when loading search results on https://amazon.com/
In the HTML parser spec, there are 2 instances of the following text:
add the attribute and its corresponding value to that element
The "add the attribute" text does not have a corresponding spec link to
actually specify what to do. We currently use `set_attribute`, which can
throw an exception if the attribute name contains an invalid character
(such as '<'). Instead, switch to `append_attribute`, which allows such
attribute names. This behavior matches Firefox.
Note we cannot yet make the unclosed-html-element.html test match the
expectations of the unclosed-body-element.html due to another bug that
would prevent checking if the expected element has the right attribute.
That will be fixed in an upcoming commit.
These were being immediately stored in JS::GCPtrs (and dutifully visited
by HTMLParser), so creating temporary handles for them was a complete
waste of time.
Instead of allowing arbitrarily large values (which could eventually
overflow an i32), let's just cap them at the same limit as Firefox does.
Found by Domato.
Changes the signature of queue_global_task() from AK:Function to
JS::HeapFunction to be more clear to the user of the function that this
is what it uses internally.
This piggybacks on the same fragment serialization code that innerHTML
uses, but instead of constructing an imaginary parent element like the
spec asks us to, we just add a separate serialization mode that includes
the context element in the serialized markup.
This makes the image carousel on https://utah.edu/ show up :^)
No need to force an allocation. This makes a future patch a bit simpler,
where we will have the encoding as a String. With this patch, we won't
have to convert it to a ByteString.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
Normally, assigning to e.g document.body.onload will forward to
window.onload. However, in a detached DOM tree, there is no associated
window, so we have nowhere to forward to, making this a no-op.
The bulk of this change is making Document::window() return a nullable
pointer, as documents created by DOMParser or DOMImplementation do not
have an associated window object, and so must be able to return null
from here.
If a call to `document.write` inserts an incomplete HTML tag, e.g.:
document.write("<p");
we would previously continue parsing the document until we reached a
closing angle bracket. However, the spec states we should stop once we
reach the new insertion point.
HTML fragments are parsed with a temporary HTML document that never has
its flag set to say that it is ready to have scripts executed. For these
fragments, in the HTMLParser, these scripts are prepared, but
execute_script is never called on them.
This results in the HTMLParser waiting forever on the document to be
ready to have scripts executed.
To fix this, only wait for the document to be ready if we are definitely
going to execute a script.
This fixes a hang processing the HTML in the attached test, as seen on:
https://github.com/SerenityOS/serenityFixes: #22735
This is a little awkward: The spec requires when loading media documents
or ones that don't have a DOM, that we "act as if the user agent had
stopped parsing document" which means following this algorithm. Only a
few steps require an HTMLParser, but those that do, involve reaching
into its internals. The simplest solution I could think of (other than
duplicating this fairly hefty function) is making it static and taking
a Document and optional HTMLParser as parameters.
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')