Call HTML::EventLoop::spin_until() from the HTML parser when deciding
whether we can run a script yet.
Note that spin_until() actually doesn't do any work yet.
Since we can't simply give HTML::EventLoop control of the whole program,
we have to integrate with Core::EventLoop.
We do this by having a single-shot 0ms Core::Timer that we start when
a task is added to the queue, and restart after processing the queue and
there are still tasks in the queue.
This patch attaches a HTML::EventLoop to the main thread JS::VM used
for JavaScript bindings in the web engine.
The goal here is to model the various task scheduling mechanisms of the
HTML specification.
No major engine allows whitespace in the type when checking for
"module".
This was also reflected in the relevant web platform test, but not in
the spec.
The spec has been changed to match this behaviour: 23c723e3e9
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Change all the places that were including the deprecated parser, to
include the new one instead, and then delete the old parser code.
`ParentNode::query_selector[_all]()` now treat their input as a
comma-separated list of selectors, instead of just one, and return
elements that match any of the selectors in that list. This is according
to these specs:
- querySelector/querySelectorAll:
https://dom.spec.whatwg.org/#ref-for-dom-parentnode-queryselector%E2%91%A0
- selector matching algorithm:
https://www.w3.org/TR/selectors-4/#match-against-tree
The previous behavior of mapping a missing value to the "inherit"
state is incompatible. Now, a missing value maps to the "true" state,
which is the expected behavior.
This completely changes how HTMLTokens store their data. Previously,
space was allocated for all token types separately. Now, the HTMLToken's
data is stored in just a String, two booleans and a Variant.
This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces
raw tokenization time by around 20 to 50 percent, depending on the page.
Full document parsing time (with HTMLDocumentParser, on a local HTML
page without any dependency files) is reduced by between 4 and 20
percent, depending on the page.
Since tokenizing HTML pages can easily generated 50'000 tokens and more,
the storage has been designed in a way that avoids heap allocations
where possible, while trying to reduce the size of the tokens. The only
tokens which need to allocate on the heap are thus DOCTYPE tokens (max.
1 per document), and tag tokens (but only if they have attributes). This
way, only around 5 percent of all tokens generated need to allocate on
the heap (except for StringImpl allocations).
Since all interaction with the HTMLToken class now happens over getters
and setters, there is no more need for HTMLTokenizer and
HTMLDocumentParser to have direct access to the members.
This is in preparation for an upcoming storage change of HTMLToken. In
contrast to the other token types, the accessor can hand out a mutable
reference to allow users to change parts of the DoctypeData easily.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.