0ct0pu5/ladybird

Author	SHA1	Message	Date
Andreas Kling	68b1bdc234	LibWeb: Add a way to stop the new HTML parser Some things are specced to "stop parsing", which basically just means to stop fetching tokens and jump to "The end"	2020-05-28 18:55:18 +02:00
Andreas Kling	00b44ab148	LibWeb: Implement more of the "after body" insertion mode	2020-05-28 18:52:32 +02:00
Andreas Kling	cba5d59adc	LibWeb: Parse comments in the "in body" insertion mode	2020-05-28 18:46:39 +02:00
Andreas Kling	5f8cbe6a1b	LibWeb: Fix HTMLDocumentParser build	2020-05-28 18:20:55 +02:00
Andreas Kling	308cb69329	LibWeb: Remove a misplaced call to close_a_p_element() in "in body" This should only be done for the corresponding start tags.	2020-05-28 18:18:20 +02:00
Andreas Kling	c84212aaba	LibWeb: Add a StackOfOpenElements helper for "popping until a tag name"	2020-05-28 18:18:20 +02:00
Andreas Kling	5e53c45113	LibWeb: Plumb content encoding into the new HTML parser We still don't handle non-ASCII input correctly, but at least now we'll convert e.g ISO-8859-1 to UTF-8 before starting to tokenize. This patch also makes "view source" work with the new parser. :^)	2020-05-28 12:35:19 +02:00
Andreas Kling	772b51038e	LibWeb: Parse "input" tags during the "in body" insertion mode	2020-05-28 12:19:18 +02:00
Andreas Kling	7aa7a2078f	LibWeb: Parse "td" start tags during "in cell" insertion mode	2020-05-28 11:46:08 +02:00
Andreas Kling	ebb1649a52	LibWeb: Implement more table support in the new HTML parser This is enough to parse the Google front page! (Note: I did have to hack the tokenizer while parsing Google, in order to avoid named character references screwing everything up. We'll fix that too soon enough!)	2020-05-28 00:27:46 +02:00
Andreas Kling	7f18c51f4c	LibWeb: Flesh out "reset the insertion mode appropriately" algorithm	2020-05-28 00:27:00 +02:00
Andreas Kling	2a97127faa	LibWeb: Handle various self-closing tags during "in body" insertion We can now parse self-closing "<img>" tags correctly! :^)	2020-05-28 00:25:56 +02:00
Andreas Kling	f69001339f	LibWeb: Handle inline stylesheets a bit better in the new parser While we're still supporting both the old and the new parser, we have to deal with the way they load inline stylesheet (and scripts) a bit differently. The old parser loads all the text content up front, and then notifies the containing element. The new parser creates the containing element up front and appends text inside it afterwards. For now, we simply do an empty "children_changed" notification when first inserting a text node inside an element. This at least prevents the CSS parser from choking on a single-character stylesheet.	2020-05-28 00:23:34 +02:00
Andreas Kling	3ce1af27dc	LibWeb: Parse documents without DOCTYPE gracefully Seems like SOMEONE forgot to put a <!DOCTYPE html> on serenityos.org.. No matter, now we can handle it in the new parser! :^)	2020-05-28 00:22:08 +02:00
Andreas Kling	d25ffd3ed8	LibWeb: Fire a DOMContentLoaded event when the new parser is finished With this change, we can finally load and render welcome.html :^)	2020-05-27 23:32:50 +02:00
Andreas Kling	db6cf9b37d	LibWeb: Implement the first half of the Adoption Agency Algorithm The AAA is a somewhat daunting algorithm you have to run for certain tag when inserted inside the <body> element. The purpose of it is to resolve issues with mismatched tags. This patch implements the first half of the AAA. We also move the "list of active formatting elements" to its own class, since it kept accumulating little behaviors. "Marker" entries are now signified by null Element pointers in the list.	2020-05-27 23:22:42 +02:00
Andreas Kling	4c9c6b3a7b	LibWeb: Bring up basic external script execution in the new parser This only works in some narrow cases, but should be enough for our own welcome.html at least. :^)	2020-05-27 23:02:03 +02:00
Andreas Kling	1b0c39ca60	LibWeb: Handle more benign parse errors in the "in body" insertion mode	2020-05-27 18:30:29 +02:00
TheDumpap	c700a30ce8	LibWeb: Handle additional parser inputs in "initial" and "before html".	2020-05-27 11:10:54 +02:00
Kevin Meyer	b85ab86c84	LibWeb: Fix step within reconstruct the active elements In step 4 of the "renstruct the active formatting elements" algorithm it says: Rewind: If there are no entries before entry in the list of active formatting elements, then jump to the step labeled create. Prior to this patch, the implementation accorded to the spec only for the first loop iteration.	2020-05-26 21:52:46 +02:00
Andreas Kling	1e30ef239b	LibWeb: Start fleshing out the "in table" parser insertion mode	2020-05-25 20:30:34 +02:00
Andreas Kling	f62a8d3b19	LibWeb: Handle some more parser inputs in the "in head" insertion mode	2020-05-25 20:16:48 +02:00
Andreas Kling	50265858ab	LibWeb: Add a PARSE_ERROR() macro to the new HTML parser Unless otherwise stated, we shouldn't stop parsing just because there's a parse error, so let's allow ourselves to continue. With this change, we can now tokenize and parse the ACID1 test. :^)	2020-05-25 20:02:27 +02:00
Andreas Kling	1df2a3d8ce	LibWeb: Use String::is_one_of() a bunch in the HTML parser	2020-05-25 19:51:23 +02:00
Andreas Kling	4cbe202d2c	LibWeb: Finally parse enough that we can actually handle welcome.html! We made it, at last! What a long journey this was. :^)	2020-05-24 23:54:22 +02:00
Andreas Kling	65d8d5e83e	LibWeb: Yet more work towards parsing www/welcome.html :^)	2020-05-24 23:54:22 +02:00
Andreas Kling	45da08a1e6	LibWeb: A whole bunch of work towards spec-compliant <script> elements This is still very unfinished, but there's at least a skeleton of code.	2020-05-24 23:54:22 +02:00
Andreas Kling	5d332c1f11	LibWeb: Parse enough to handle a <style> inside a <head> :^)	2020-05-24 23:54:22 +02:00
Andreas Kling	af8a9331b2	LibWeb: Support comments in the "in head" insertion mode	2020-05-24 23:54:22 +02:00
Andreas Kling	20911efd4d	LibWeb: More work on the HTML parser and tokenizer The parser can now switch the state of the tokenizer! Very webby. :^)	2020-05-24 23:54:22 +02:00
Andreas Kling	31db3f21ae	LibWeb: Start implementing character token parsing Now that we've gotten rid of the misguided character buffering in the tokenizer, it actually spits out character tokens that we have to deal with in the parser. This patch implements enough to bring us back to speed with simple.html	2020-05-24 23:54:22 +02:00
Andreas Kling	53d2f4df70	LibWeb: Factor out the "stack of open elements" into its own class This will allow us to write more expressive parsing code. :^)	2020-05-24 23:54:22 +02:00
Daniel Gustafsson	6561987e9f	LibWeb: Fix copy-paste error in HTMLDocumentParser (#2358 ) When watching the video of the new HTML parser I noticed a small copy and paste error. In one of the cases in `handle_after_head` the code was checking for end tags when it should check for start tags. I haven't tested this change, just looking at the spec.	2020-05-24 13:48:46 +02:00
Andreas Kling	e44c87cfff	LibWeb: Implement enough HTML parsing to handle a small simple DOM :^) We can now parse a little DOM like this: <!DOCTYPE html> <html> <head></head> <body> <div></div> </body> </html> This is pretty slow work, but the incremental progress is satisfying!	2020-05-24 00:49:22 +02:00
Andreas Kling	fd1b31d0ff	LibWeb: Start building the tree building part of the new HTML parser This patch adds a new HTMLDocumentParser class. It keeps a tokenizer object internally and feeds itself with one token at a time from it. The names and idioms in this class are expressed as closely to the actual HTML parsing spec as possible, to make development as easy and bug free as possible. :^) This is going to become pretty large, but it's pretty cool!	2020-05-24 00:14:23 +02:00

35 commits