This allows you to not have to write a separate test file
for the same thing but in a different situation.
This doesn't handle when you change the page with location.href
however.
Changes the name of the page load handlers to prevent confusion
with this.
The specification says that parts labelled as a "fragment case" will
only occur when parsing a fragment. It says that if it occurs when
not parsing a fragment, then it is a specification error.
We should probably assume at this point that it's an implementation
error. This fixes a few little mistakes that were caught out by this.
Also moves the context element outside insertion mode reset,
as other (unimplemented) parts refer to it, such as
"adjusted current node".
Also cleans up insertion mode reset.
This allows us to determine which mode to render the page in.
Exposes "doctype" and "compatMode" on Document.
Exposes "name", "publicId" and "systemId" on DocumentType.
This patch implements most of the HTML fragment parsing algorithm and
ports Element::set_inner_html() to it. This was the last remaining user
of the old HTML parser. :^)
The only remaining client of the old parser is the fragment parser used
by the Element.innerHTML setter. We'll need to implement a bit more
stuff in the new parser before we can switch that over.
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^)
This is enough to parse the Google front page! (Note: I did have to
hack the tokenizer while parsing Google, in order to avoid named
character references screwing everything up. We'll fix that too soon
enough!)
The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.
This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
Now that we've gotten rid of the misguided character buffering in the
tokenizer, it actually spits out character tokens that we have to deal
with in the parser.
This patch implements enough to bring us back to speed with simple.html
We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
<head></head>
<body>
<div></div>
</body>
</html>
This is pretty slow work, but the incremental progress is satisfying!
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool!