Commit graph

94 commits

Author SHA1 Message Date
Luke
08221139a5 test-web: Add ability to change page mid-test
This allows you to not have to write a separate test file
for the same thing but in a different situation.

This doesn't handle when you change the page with location.href
however.

Changes the name of the page load handlers to prevent confusion
with this.
2020-07-25 12:35:15 +02:00
Andreas Kling
3cb50a4714 LibWeb: Rename Element::tag_name() => local_name()
To prepare for fully qualified tag names, let's call this local_name.
Note that we still keep an Element::tag_name() around since that's what
the JS bindings end up calling into for the Element.tagName property.
2020-07-23 18:18:13 +02:00
Andreas Kling
6e02ef19d1 LibWeb: Add a helper for creating a fake (start tag) HTML token
Sometimes the parsing rules say we need to insert a fake HTML token.
Let's have a convenient way of doing that!
2020-07-23 17:31:08 +02:00
Luke
201cc1bfcc LibWeb: Assert we're parsing a fragment on fragment cases
The specification says that parts labelled as a "fragment case" will
only occur when parsing a fragment. It says that if it occurs when
not parsing a fragment, then it is a specification error.

We should probably assume at this point that it's an implementation
error. This fixes a few little mistakes that were caught out by this.

Also moves the context element outside insertion mode reset,
as other (unimplemented) parts refer to it, such as
"adjusted current node".

Also cleans up insertion mode reset.
2020-07-22 00:02:40 +02:00
Luke
19d6884529 LibWeb: Implement quirks mode detection
This allows us to determine which mode to render the page in.

Exposes "doctype" and "compatMode" on Document.
Exposes "name", "publicId" and "systemId" on DocumentType.
2020-07-21 01:08:32 +02:00
Nico Weber
e9d18e35d6 LibWeb: Move "Stop parsing!" behind PARSER_DEBUG
This makes SerenityOS's IRC client a lot less chatty.
2020-07-06 17:03:26 +02:00
theazgra
6a401a9bde LibWeb: Remove duplicate if branch in fragment parsing.
I noticed in the video the duplicate `if` check. This commit removes
the duplicated branch.
2020-06-26 11:58:53 +02:00
Andreas Kling
92d831c25b LibWeb: Implement fragment parsing and use it for Element.innerHTML
This patch implements most of the HTML fragment parsing algorithm and
ports Element::set_inner_html() to it. This was the last remaining user
of the old HTML parser. :^)
2020-06-26 00:53:25 +02:00
Andreas Kling
3a5af6ef61 LibWeb: Remove hacky old ways of running <script> element contents
Now that we're using the new HTML parser, we don't have to do the weird
"run the script when inserted into the document, uhh, or when the text
content of the script element changes" dance.

Instead, we just follow the spec, and scripts run the way they should.
2020-06-23 16:45:01 +02:00
Andreas Kling
07d976716f LibWeb: Remove most uses of the old HTML parser
The only remaining client of the old parser is the fragment parser used
by the Element.innerHTML setter. We'll need to implement a bit more
stuff in the new parser before we can switch that over.
2020-06-21 22:29:05 +02:00
Andreas Kling
dd7cd92de4 LibWeb: Fix two typo bugs in table parsing
These were flushed out by the earlier fix to "table scope". Without the
bad implementation of table scopes, ACID2 stopped parsing correctly.
2020-06-21 17:49:02 +02:00
Andreas Kling
15b5dfc794 LibWeb: A </table> inside <tbody> is not a parse error
This condition was backwards. Fixes parsing of google.com.
2020-06-21 17:42:00 +02:00
Andreas Kling
966bc05fef LibWeb: Implement more of the foster parenting algorithm in the parser 2020-06-21 17:42:00 +02:00
stelar7
5eb39a5f61 LibWeb: Update parser with more insertion modes :^)
Implements handling of InHeadNoScript, InSelectInTable, InTemplate,
InFrameset, AfterFrameset, and AfterAfterFrameset.
2020-06-21 10:13:31 +02:00
Andreas Kling
49cd03be95 LibWeb: Fix broken parsing of </form> during "in body" insertion 2020-06-15 20:31:19 +02:00
Andreas Kling
2f26d4c6a1 LibWeb: Fix broken parsing of </select> during "in select" insertion 2020-06-15 19:57:20 +02:00
Andreas Kling
17d26b92f8 LibWeb: Just ignore <script> elements that failed to load the script
We're never gonna be able to run them if we can't load them so just
let it go.
2020-06-15 18:37:48 +02:00
Luke
a01478c858 LibWeb: Fully implement HTML parser "in table" insertion mode
Also fixes some little mistakes in the "in body" insertion mode
that I found whilst cross-referencing.
2020-06-14 14:07:07 +02:00
Luke
6532c1e2fa LibWeb: Implement HTML parser "in column group" insertion mode 2020-06-14 14:07:07 +02:00
Luke
2241b09cd0 LibWeb: Implement HTML parser "in caption" insertion mode 2020-06-14 14:07:07 +02:00
Luke
821312729a LibWeb: Fully implement all DOCTYPE tokenizer states
Also fixes TagOpen having a seperate emit and reconsume in
ANYTHING_ELSE.
2020-06-14 13:47:19 +02:00
Andreas Kling
9b17bf3dcd LibWeb: Use HTML::TagNames globals in the new HTML parser 2020-06-07 23:53:16 +02:00
Andreas Kling
be6abce44f LibWeb: Handle EOF tokens during "text" insertion 2020-06-06 16:36:18 +02:00
Andreas Kling
3337365000 LibWeb: Parse param/source/track start tags during "in body" insertion 2020-06-05 21:59:46 +02:00
Andreas Kling
b4591f0037 LibWeb: Fix parsing of "<textarea></textarea>"
When handling a "textarea" start tag, we have to ignore the next token
if it's an LF ('\n'). However, we were not switching the tokenizer
state before fetching the lookahead token, and this caused us to force
the tokenizer into the RCDATA state too late, effectively getting it
stuck in that state for way longer than it should be.

Fixes #2508.
2020-06-05 12:05:42 +02:00
Kyle McLean
b9549078cc LibWeb: Handle "html" end tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
a3bf3a5d68 LibWeb: Handle "xmp" start tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
c70bd0ba58 LibWeb: Handle "nobr" start tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
22521e57fd LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template" 2020-06-04 09:09:33 +02:00
Kyle McLean
4edd0643a6 LibWeb: Handle NULL character during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
5e3972a946 LibWeb: Parse "body" end tags during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
1ad81e4833 LibWeb: Parse "br" end tags during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
9fca4b56d3 LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body" 2020-06-04 09:09:33 +02:00
Andreas Kling
3c2fbc825c LibWeb: Call children_changed() on text nodes when flushing characters
Now that we flush characters in a single place, we can call the Text's
children_changed() from there instead of having a goofy targeted hack
for <style> elements. :^)
2020-06-03 22:13:29 +02:00
Andreas Kling
c40de9275a LibWeb: Buffer text node character insertions in the new parser
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
2020-06-03 21:53:08 +02:00
Andreas Kling
410fa5abe0 LibWeb: Parse barebones document without doctype, <html>, etc.
Last night I tried making a little test page that had a bunch of <img>
elements and nothing else. It didn't work.

Fix this by correctly adding a synthesized <html> element to the
document if we get something else in the "before html insertion mode.
2020-06-02 08:50:33 +02:00
Andreas Kling
e5ddb76a67 LibWeb: Support "td" and "th" start tags during "in table body"
This makes it possible to load Google Image Search results. You can't
see the images yet, but it's still something. :^)
2020-06-01 22:09:09 +02:00
Andreas Kling
8766e49a7c LibWeb+Browser: Use the new HTML parser by default
You can still run the old parser with "br -O", but the new one is good
enough to be the default parser now. We'll fix issues as we go and
eventually remove the old one completely. :^)
2020-06-01 19:08:31 +02:00
Andreas Kling
5944abf31c LibWeb: More parser cases in the "in body" and "after after body" modes 2020-06-01 18:46:11 +02:00
Andreas Kling
8429551368 LibWeb: Implement more of the "after head" insertion mode 2020-06-01 18:46:11 +02:00
Andreas Kling
d058addd74 LibWeb: Handle "dd" and "dt" end tags during "in body" 2020-05-30 23:00:35 +02:00
Andreas Kling
ca6fbefbc9 LibWeb: Support parsing "select" elements (outside of tables) 2020-05-30 19:58:52 +02:00
Andreas Kling
60352c7b9b LibWeb: Hack the parser to dodge <template> elements in <head> for now 2020-05-30 19:23:04 +02:00
Andreas Kling
ca23db10ef LibWeb: Don't crash when encountering <svg> or <math> elements
Just treat them like unknown elements for now. :^)
2020-05-30 18:46:39 +02:00
Andreas Kling
756829555a LibWeb: Parse "textarea" tags during the "in body" insertion mode
Had to handle some more cases in the tokenizer to support this.
2020-05-30 18:40:23 +02:00
Andreas Kling
f4778d1ba0 LibWeb: Add missing special tag case in the "in body" insertion mode 2020-05-30 18:26:44 +02:00
Andreas Kling
5818ef2c80 LibWeb: Implement more table-related insertion modes 2020-05-30 18:26:44 +02:00
Andreas Kling
8c96b8174b LibWeb: Handle AAA situation where there's no formatting element found
In this case, we're supposed to return from the AAA and then jump to a
different behavior in the "in body" insertion mode. So now we do that.
2020-05-30 17:47:50 +02:00
Andreas Kling
f662b1ea37 LibWeb: Implement enough parsing to parse the HTML spec front page :^)
We can now actually open http://html.spec.whatwg.org/ in Browser.
2020-05-30 13:07:47 +02:00
Andreas Kling
770372ad02 LibWeb: Handle end-of-file token during "in body" insertion mode 2020-05-30 12:40:12 +02:00