0ct0pu5/ladybird

Author	SHA1	Message	Date
Luke	2241b09cd0	LibWeb: Implement HTML parser "in caption" insertion mode	2020-06-14 14:07:07 +02:00
Luke	821312729a	LibWeb: Fully implement all DOCTYPE tokenizer states Also fixes TagOpen having a seperate emit and reconsume in ANYTHING_ELSE.	2020-06-14 13:47:19 +02:00
Andreas Kling	9b17bf3dcd	LibWeb: Use HTML::TagNames globals in the new HTML parser	2020-06-07 23:53:16 +02:00
Andreas Kling	be6abce44f	LibWeb: Handle EOF tokens during "text" insertion	2020-06-06 16:36:18 +02:00
Andreas Kling	3337365000	LibWeb: Parse param/source/track start tags during "in body" insertion	2020-06-05 21:59:46 +02:00
Andreas Kling	b4591f0037	LibWeb: Fix parsing of "<textarea></textarea>" When handling a "textarea" start tag, we have to ignore the next token if it's an LF ('\n'). However, we were not switching the tokenizer state before fetching the lookahead token, and this caused us to force the tokenizer into the RCDATA state too late, effectively getting it stuck in that state for way longer than it should be. Fixes #2508.	2020-06-05 12:05:42 +02:00
Kyle McLean	b9549078cc	LibWeb: Handle "html" end tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	a3bf3a5d68	LibWeb: Handle "xmp" start tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	c70bd0ba58	LibWeb: Handle "nobr" start tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	22521e57fd	LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template"	2020-06-04 09:09:33 +02:00
Kyle McLean	4edd0643a6	LibWeb: Handle NULL character during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	5e3972a946	LibWeb: Parse "body" end tags during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	1ad81e4833	LibWeb: Parse "br" end tags during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	9fca4b56d3	LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body"	2020-06-04 09:09:33 +02:00
Andreas Kling	3c2fbc825c	LibWeb: Call children_changed() on text nodes when flushing characters Now that we flush characters in a single place, we can call the Text's children_changed() from there instead of having a goofy targeted hack for <style> elements. :^)	2020-06-03 22:13:29 +02:00
Andreas Kling	c40de9275a	LibWeb: Buffer text node character insertions in the new parser Instead of appending character-at-a-time, we now buffer character insertions in a StringBuilder, and flush them to the relevant node whenever we start inserting into a new node (and when parsing ends.)	2020-06-03 21:53:08 +02:00
Andreas Kling	410fa5abe0	LibWeb: Parse barebones document without doctype, <html>, etc. Last night I tried making a little test page that had a bunch of <img> elements and nothing else. It didn't work. Fix this by correctly adding a synthesized <html> element to the document if we get something else in the "before html insertion mode.	2020-06-02 08:50:33 +02:00
Andreas Kling	e5ddb76a67	LibWeb: Support "td" and "th" start tags during "in table body" This makes it possible to load Google Image Search results. You can't see the images yet, but it's still something. :^)	2020-06-01 22:09:09 +02:00
Andreas Kling	8766e49a7c	LibWeb+Browser: Use the new HTML parser by default You can still run the old parser with "br -O", but the new one is good enough to be the default parser now. We'll fix issues as we go and eventually remove the old one completely. :^)	2020-06-01 19:08:31 +02:00
Andreas Kling	5944abf31c	LibWeb: More parser cases in the "in body" and "after after body" modes	2020-06-01 18:46:11 +02:00
Andreas Kling	8429551368	LibWeb: Implement more of the "after head" insertion mode	2020-06-01 18:46:11 +02:00
Andreas Kling	d058addd74	LibWeb: Handle "dd" and "dt" end tags during "in body"	2020-05-30 23:00:35 +02:00
Andreas Kling	ca6fbefbc9	LibWeb: Support parsing "select" elements (outside of tables)	2020-05-30 19:58:52 +02:00
Andreas Kling	60352c7b9b	LibWeb: Hack the parser to dodge <template> elements in <head> for now	2020-05-30 19:23:04 +02:00
Andreas Kling	ca23db10ef	LibWeb: Don't crash when encountering <svg> or <math> elements Just treat them like unknown elements for now. :^)	2020-05-30 18:46:39 +02:00
Andreas Kling	756829555a	LibWeb: Parse "textarea" tags during the "in body" insertion mode Had to handle some more cases in the tokenizer to support this.	2020-05-30 18:40:23 +02:00
Andreas Kling	f4778d1ba0	LibWeb: Add missing special tag case in the "in body" insertion mode	2020-05-30 18:26:44 +02:00
Andreas Kling	5818ef2c80	LibWeb: Implement more table-related insertion modes	2020-05-30 18:26:44 +02:00
Andreas Kling	8c96b8174b	LibWeb: Handle AAA situation where there's no formatting element found In this case, we're supposed to return from the AAA and then jump to a different behavior in the "in body" insertion mode. So now we do that.	2020-05-30 17:47:50 +02:00
Andreas Kling	f662b1ea37	LibWeb: Implement enough parsing to parse the HTML spec front page :^) We can now actually open http://html.spec.whatwg.org/ in Browser.	2020-05-30 13:07:47 +02:00
Andreas Kling	770372ad02	LibWeb: Handle end-of-file token during "in body" insertion mode	2020-05-30 12:40:12 +02:00
Andreas Kling	368044eabd	LibWeb: Flesh out the "in head" insertion mode and add missing cases	2020-05-30 12:28:12 +02:00
Andreas Kling	e82226f3fb	LibWeb: Handle two kinds of deferred script executions This patch adds two script lists to Document: - Scripts to execute when parsing has finished - Scripts to execute as soon as possible Since we don't actually load scripts asynchronously yet (we just do a synchronous load when parsing the <script> element for simplicity), these are already loaded by the time we get to "The end" of parsing.	2020-05-30 12:26:15 +02:00
Andreas Kling	fbd52047bb	LibWeb: Parse "form" tags during the "in body" insertion mode	2020-05-30 11:31:49 +02:00
Andreas Kling	b9d5d45eff	LibWeb: Handle an error condition for "a" start tag during "in body" If we have an <a> element on the list of active formatting elements when hitting another "a" start tag, that's a parse error. Recover by using the AAA.	2020-05-30 11:31:49 +02:00
Andreas Kling	1ef5d609d9	AK+LibC: Add TODO() as an alternative to ASSERT_NOT_REACHED() I've been using this in the new HTML parser and it makes it much easier to understand the state of unfinished code branches. TODO() is for places where it's okay to end up but we need to implement something there. ASSERT_NOT_REACHED() is for places where it's not okay to end up, and something has gone wrong.	2020-05-30 11:31:49 +02:00
Andreas Kling	cfbd95f42a	LibWeb: Turn a bunch of ASSERT_NOT_REACHED() in the parser into TODO()	2020-05-30 11:31:49 +02:00
Andreas Kling	6854f726ce	LibWeb: Improve support for "a" and "li" during "in body" insertion We can now parse welcome.html once again, without resorting to hacks or fallbacks during "in body" :^)	2020-05-30 11:31:49 +02:00
Andreas Kling	30d64fccde	LibWeb: Parse "li" start tags in the "in body" insertion mode	2020-05-30 11:31:49 +02:00
Andreas Kling	2b1517f215	LibWeb: Add all branches from the parsing spec to "in body" This makes us crash in TODO() more often, but it's better that we know what's missing instead of incorrectly ending up on the fallback path.	2020-05-30 11:31:49 +02:00
Andreas Kling	68b1bdc234	LibWeb: Add a way to stop the new HTML parser Some things are specced to "stop parsing", which basically just means to stop fetching tokens and jump to "The end"	2020-05-28 18:55:18 +02:00
Andreas Kling	00b44ab148	LibWeb: Implement more of the "after body" insertion mode	2020-05-28 18:52:32 +02:00
Andreas Kling	cba5d59adc	LibWeb: Parse comments in the "in body" insertion mode	2020-05-28 18:46:39 +02:00
Andreas Kling	5f8cbe6a1b	LibWeb: Fix HTMLDocumentParser build	2020-05-28 18:20:55 +02:00
Andreas Kling	308cb69329	LibWeb: Remove a misplaced call to close_a_p_element() in "in body" This should only be done for the corresponding start tags.	2020-05-28 18:18:20 +02:00
Andreas Kling	c84212aaba	LibWeb: Add a StackOfOpenElements helper for "popping until a tag name"	2020-05-28 18:18:20 +02:00
Andreas Kling	5e53c45113	LibWeb: Plumb content encoding into the new HTML parser We still don't handle non-ASCII input correctly, but at least now we'll convert e.g ISO-8859-1 to UTF-8 before starting to tokenize. This patch also makes "view source" work with the new parser. :^)	2020-05-28 12:35:19 +02:00
Andreas Kling	772b51038e	LibWeb: Parse "input" tags during the "in body" insertion mode	2020-05-28 12:19:18 +02:00
Andreas Kling	7aa7a2078f	LibWeb: Parse "td" start tags during "in cell" insertion mode	2020-05-28 11:46:08 +02:00
Andreas Kling	ebb1649a52	LibWeb: Implement more table support in the new HTML parser This is enough to parse the Google front page! (Note: I did have to hack the tokenizer while parsing Google, in order to avoid named character references screwing everything up. We'll fix that too soon enough!)	2020-05-28 00:27:46 +02:00

1 2

75 commits