Commit graph

31 commits

Author SHA1 Message Date
Luke
08221139a5 test-web: Add ability to change page mid-test
This allows you to not have to write a separate test file
for the same thing but in a different situation.

This doesn't handle when you change the page with location.href
however.

Changes the name of the page load handlers to prevent confusion
with this.
2020-07-25 12:35:15 +02:00
Andreas Kling
6e02ef19d1 LibWeb: Add a helper for creating a fake (start tag) HTML token
Sometimes the parsing rules say we need to insert a fake HTML token.
Let's have a convenient way of doing that!
2020-07-23 17:31:08 +02:00
Luke
201cc1bfcc LibWeb: Assert we're parsing a fragment on fragment cases
The specification says that parts labelled as a "fragment case" will
only occur when parsing a fragment. It says that if it occurs when
not parsing a fragment, then it is a specification error.

We should probably assume at this point that it's an implementation
error. This fixes a few little mistakes that were caught out by this.

Also moves the context element outside insertion mode reset,
as other (unimplemented) parts refer to it, such as
"adjusted current node".

Also cleans up insertion mode reset.
2020-07-22 00:02:40 +02:00
Luke
19d6884529 LibWeb: Implement quirks mode detection
This allows us to determine which mode to render the page in.

Exposes "doctype" and "compatMode" on Document.
Exposes "name", "publicId" and "systemId" on DocumentType.
2020-07-21 01:08:32 +02:00
Andreas Kling
92d831c25b LibWeb: Implement fragment parsing and use it for Element.innerHTML
This patch implements most of the HTML fragment parsing algorithm and
ports Element::set_inner_html() to it. This was the last remaining user
of the old HTML parser. :^)
2020-06-26 00:53:25 +02:00
Andreas Kling
07d976716f LibWeb: Remove most uses of the old HTML parser
The only remaining client of the old parser is the fragment parser used
by the Element.innerHTML setter. We'll need to implement a bit more
stuff in the new parser before we can switch that over.
2020-06-21 22:29:05 +02:00
Andreas Kling
966bc05fef LibWeb: Implement more of the foster parenting algorithm in the parser 2020-06-21 17:42:00 +02:00
stelar7
5eb39a5f61 LibWeb: Update parser with more insertion modes :^)
Implements handling of InHeadNoScript, InSelectInTable, InTemplate,
InFrameset, AfterFrameset, and AfterAfterFrameset.
2020-06-21 10:13:31 +02:00
Luke
6532c1e2fa LibWeb: Implement HTML parser "in column group" insertion mode 2020-06-14 14:07:07 +02:00
Luke
2241b09cd0 LibWeb: Implement HTML parser "in caption" insertion mode 2020-06-14 14:07:07 +02:00
Andreas Kling
c40de9275a LibWeb: Buffer text node character insertions in the new parser
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
2020-06-03 21:53:08 +02:00
Andreas Kling
ca6fbefbc9 LibWeb: Support parsing "select" elements (outside of tables) 2020-05-30 19:58:52 +02:00
Andreas Kling
5818ef2c80 LibWeb: Implement more table-related insertion modes 2020-05-30 18:26:44 +02:00
Andreas Kling
8c96b8174b LibWeb: Handle AAA situation where there's no formatting element found
In this case, we're supposed to return from the AAA and then jump to a
different behavior in the "in body" insertion mode. So now we do that.
2020-05-30 17:47:50 +02:00
Andreas Kling
6854f726ce LibWeb: Improve support for "a" and "li" during "in body" insertion
We can now parse welcome.html once again, without resorting to hacks
or fallbacks during "in body" :^)
2020-05-30 11:31:49 +02:00
Andreas Kling
68b1bdc234 LibWeb: Add a way to stop the new HTML parser
Some things are specced to "stop parsing", which basically just means
to stop fetching tokens and jump to "The end"
2020-05-28 18:55:18 +02:00
Andreas Kling
5e53c45113 LibWeb: Plumb content encoding into the new HTML parser
We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^)
2020-05-28 12:35:19 +02:00
Andreas Kling
7aa7a2078f LibWeb: Parse "td" start tags during "in cell" insertion mode 2020-05-28 11:46:08 +02:00
Andreas Kling
ebb1649a52 LibWeb: Implement more table support in the new HTML parser
This is enough to parse the Google front page! (Note: I did have to
hack the tokenizer while parsing Google, in order to avoid named
character references screwing everything up. We'll fix that too soon
enough!)
2020-05-28 00:27:46 +02:00
Andreas Kling
db6cf9b37d LibWeb: Implement the first half of the Adoption Agency Algorithm
The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.

This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
2020-05-27 23:22:42 +02:00
Andreas Kling
4c9c6b3a7b LibWeb: Bring up basic external script execution in the new parser
This only works in some narrow cases, but should be enough for our own
welcome.html at least. :^)
2020-05-27 23:02:03 +02:00
Andreas Kling
1e30ef239b LibWeb: Start fleshing out the "in table" parser insertion mode 2020-05-25 20:30:34 +02:00
Andreas Kling
65d8d5e83e LibWeb: Yet more work towards parsing www/welcome.html :^) 2020-05-24 23:54:22 +02:00
Andreas Kling
45da08a1e6 LibWeb: A whole bunch of work towards spec-compliant <script> elements
This is still very unfinished, but there's at least a skeleton of code.
2020-05-24 23:54:22 +02:00
Andreas Kling
5d332c1f11 LibWeb: Parse enough to handle a <style> inside a <head> :^) 2020-05-24 23:54:22 +02:00
Andreas Kling
af8a9331b2 LibWeb: Support comments in the "in head" insertion mode 2020-05-24 23:54:22 +02:00
Andreas Kling
20911efd4d LibWeb: More work on the HTML parser and tokenizer
The parser can now switch the state of the tokenizer! Very webby. :^)
2020-05-24 23:54:22 +02:00
Andreas Kling
31db3f21ae LibWeb: Start implementing character token parsing
Now that we've gotten rid of the misguided character buffering in the
tokenizer, it actually spits out character tokens that we have to deal
with in the parser.

This patch implements enough to bring us back to speed with simple.html
2020-05-24 23:54:22 +02:00
Andreas Kling
53d2f4df70 LibWeb: Factor out the "stack of open elements" into its own class
This will allow us to write more expressive parsing code. :^)
2020-05-24 23:54:22 +02:00
Andreas Kling
e44c87cfff LibWeb: Implement enough HTML parsing to handle a small simple DOM :^)
We can now parse a little DOM like this:

<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <div></div>
    </body>
</html>

This is pretty slow work, but the incremental progress is satisfying!
2020-05-24 00:49:22 +02:00
Andreas Kling
fd1b31d0ff LibWeb: Start building the tree building part of the new HTML parser
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.

The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)

This is going to become pretty large, but it's pretty cool!
2020-05-24 00:14:23 +02:00