The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.
This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
In step 4 of the "renstruct the active formatting elements" algorithm it
says:
Rewind: If there are no entries before entry in the list of active
formatting elements, then jump to the step labeled create.
Prior to this patch, the implementation accorded to the spec only for
the first loop iteration.
Unless otherwise stated, we shouldn't stop parsing just because there's
a parse error, so let's allow ourselves to continue.
With this change, we can now tokenize and parse the ACID1 test. :^)
Now that we've gotten rid of the misguided character buffering in the
tokenizer, it actually spits out character tokens that we have to deal
with in the parser.
This patch implements enough to bring us back to speed with simple.html
When watching the video of the new HTML parser I noticed a small copy
and paste error. In one of the cases in `handle_after_head` the code
was checking for end tags when it should check for start tags.
I haven't tested this change, just looking at the spec.
We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
<head></head>
<body>
<div></div>
</body>
</html>
This is pretty slow work, but the incremental progress is satisfying!
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool!