0ct0pu5/ladybird

Author	SHA1	Message	Date
Andreas Kling	7be36366be	LibWeb: Emit character/comment tokens lazily to accumulate more data Instead of emitting data-bearing tokens immediately, do it lazily at the next state change. This allows us to accumulate full bursts of text in between tags instead of having one token per character. :^)	2020-05-23 18:44:32 +02:00
Andreas Kling	45450c7edc	LibWeb: Make BEGIN_STATE and END_STATE include some {{{ and }}} This makes it a compile error to omit the END_STATE. Also add some more missing END_STATE's exposed by this (nice!) Thanks to @predmond for suggesting the multi-pair trick! :^)	2020-05-23 15:25:43 +02:00
Andreas Kling	2e4147d0fc	LibWeb: Add missing END_STATE for TagName Fixes #2339.	2020-05-23 10:33:23 +02:00
Andreas Kling	a58500fdc5	LibWeb: Teach HTMLTokenizer how to tokenize comments We can now correctly tokenize the welcome.html test page. :^)	2020-05-23 01:54:26 +02:00
Andreas Kling	6caa5661f3	LibWeb: Teach HTMLTokenizer how to tokenize attributes Properly tokenize single-quoted, double-quoted and unquoted attributes!	2020-05-23 01:22:15 +02:00
Andreas Kling	272b35d2e1	LibWeb: Begin work on a spec-compliant HTML parser In order to actually view the web as it is, we're gonna need a proper HTML parser. So let's build one! This patch introduces the Web::HTMLTokenizer class, which currently operates on a StringView input stream where it fetches (ASCII only atm) codepoints and tokenizes acccording to the HTML spec tokenization algo. The tokenizer state machine looks a bit weird but is written in a way that tries to mimic the spec as closely as possible, in order to make development easier and bugs less likely. This initial version is far from finished, but it can parse a trivial document with a DOCTYPE and open/close tags. :^)	2020-05-22 21:46:13 +02:00

6 commits