0ct0pu5/ladybird

Author	SHA1	Message	Date
Hendiadyoin1	6a95df2526	LibTextCodec: Don't allocate Strings on encoding normalisation This ripples down to LibWeb's HTML and XHR decoders, which therefore become less allocation heavy.	2022-03-21 10:48:17 +01:00
Simon Wanner	e154c2c2ca	LibWeb: Implement "has element in select scope" per-spec The HTML Specification is quite tricky in this case. Usually "have a particular element in <x> scope" mentions "consisting of the following element types:", but in this case it's "consisting of all element types except the following:" Thanks to @AtkinsSJ for spotting this difference	2022-03-21 10:47:46 +01:00
Simon Wanner	1d95745901	LibWeb: Implement the rest of the Adoption Agency Algorithm This gets us 2 points on html5test.com :^) - Before: https://html5te.st/4cf57659bc08272e (208) - After: https://html5te.st/fb8a9259bda1c115 (210)	2022-03-20 02:52:37 +01:00
Andreas Kling	cbd343dced	LibWeb: Only delay "load" event for script elements that load something We shouldn't delay the load event for scripts that we're completely refusing to run anyway. Also, for scripts that have inline text content, we don't need to delay them either, as they will become ready before returning from "prepare script". This makes the "load" event finally fire on lots of websites, including Wikipedia. :^)	2022-03-19 16:11:36 +01:00
Andreas Kling	2c9dfadb21	LibWeb: Don't delay document "load" event for unclosed script tags We previously had a bug where markup with unclosed script tags caused the document load event to be delayed indefinitely. Fix this by only marking script elements as delaying the load event once we encounter the script end tag.	2022-03-19 15:04:48 +01:00
Lenny Maiorani	c37820b898	Libraries: Use default constructors/destructors in LibWeb https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#cother-other-default-operation-rules "The compiler is more likely to get the default semantics right and you cannot implement these functions better than the compiler."	2022-03-17 17:23:49 +00:00
Idan Horowitz	c575710e5e	LibWeb: Use inline script tag source line as javascript line offset This makes JS exception line numbers meaningful for inline script tags.	2022-03-14 00:25:33 +01:00
Linus Groh	1422bd45eb	LibWeb: Move Window from DOM directory & namespace to HTML The Window object is part of the HTML spec. :^) https://html.spec.whatwg.org/multipage/window-object.html	2022-03-08 00:30:30 +01:00
Andreas Kling	1061c863f8	LibWeb: Fix issue where double-quoted doctype system ID was not captured We were storing double-quoted system ID's in the public ID field. 1% progression on ACID3. :^)	2022-03-02 12:30:15 +01:00
Luke Wilde	46c0d0f7ae	LibWeb: Associate form elements with a form in parsing and dynamically This makes it available for all form associated elements and not just select and input elements. It also makes it more spec compliant, especially around the form attribute. The main thing missing is re-associating form elements with a form attribute when the form attribute changes or an element with an ID is inserted/removed or has its ID changed.	2022-03-01 23:19:41 +01:00
Andreas Kling	8b2499b112	LibWeb: Make document.write() work while document is parsing This necessitated making HTMLParser ref-counted, and having it register itself with Document when created. That makes it possible for scripts to add new input at the current parser insertion point. There is now a reference cycle between Document and HTMLParser. This cycle is explicitly broken by calling Document::detach_parser() at the end of HTMLParser::run(). This is a huge progression on ACID3, from 31% to 49%! :^)	2022-02-21 22:00:28 +01:00
Lorenz Steinert	db789813c9	LibWeb: Add basic support for dynamic markup insertion This implements basic support for dynamic markup insertion, adding * Document::open() * Document::write(Vector<String> const&) * Document::writeln(Vector<String> const&) * Document::close() The HTMLParser is modified to make it possible to create a script-created parser which initially only contains a HTMLTokenizer without any data. Aditionally the HTMLParser::run method gains an overload which does not modify the Document and does not run HTMLParser::the_end() so that we can reenter the parser at a later time. Furthermore all FIXMEs that consern the insertion point are implemented wich is defined in the HTMLTokenizer. Additionally the following member-variables of the HTMLParser are now exposed by getter funcions: * m_tokenizer * m_aborted * m_script_nesting_level The HTMLTokenizer is modified so that it contains an insertion point which keeps track of where the next input from the Document::write functions will be inserted. The insertion point is implemented as the charakter offset into m_decoded_input and a boolean describing if the insertion point is defined. Functions to update, check and {re}store the insertion point are also added. The function HTMLTokenizer::insert_eof is added to tell a script-created parser that document::close was called and HTMLParser::the_end() should be called. Lastly an explicit default constructor is added to HTMLTokenizer to create a empty HTMLTokenizer into which data can be inserted.	2022-02-21 18:26:43 +01:00
Adam Hodgen	b6eaefa87d	LibWeb: Fix 'Comment end state' in HTML Tokenizer Also, update the expected hash in the LibWeb TestHTMLTokenizer regression test. This is due to the "This comment has a few too many dashes." comment token being updated.	2022-02-21 16:31:45 +01:00
Adam Hodgen	d73bb2633c	LibWeb: Implement tokenization newline preprocessing Newline normalization will replace \r and \r\n with \n. The spec specifically states > Before the tokenization stage, the input stream must be preprocessed > by normalizing newlines. wheras this is implemented the processing during the tokenization itself. This should still exhibit the same behaviour, while keeping the tokenization logic in the same place.	2022-02-21 16:31:45 +01:00
Adam Hodgen	c6fcdd0f93	LibWeb: Fix off by one error in HTML Tokenizer In 'NamedCharacterReference' we attempt to lookup the code point by a identifier, eg apos; becomes ' This is done by passing the entire rest of the document to the `HTML::code_points_from_entity` function. However, before this change we didn't sent the final character which meant if the document ended in a named character reference the lookup would fail.	2022-02-21 16:31:45 +01:00
Luke Wilde	9845164f6a	LibWeb: Handle markers when reconstructing active formatting elements The entry we get from the active formatting elements list during the Rewind step of "reconstruct the active formatting elements" can be a marker. Previously we assumed it was not a marker, which can trigger an assertion failure with certain malformed HTML. If the entry in this step is a marker, the spec simply ignores it. This is step 6 of the algorithm. This also makes the index unsigned, as this algorithm is a no-op if the list is empty. Additionally, this also adds spec comments to this algorithm. Fixes #12668.	2022-02-20 10:59:42 +01:00
Andreas Kling	25504f6a1b	LibWeb: Use Vector::clear_with_capacity() in HTMLTokenizer This avoids constantly reallocating the Vector<HTMLToken>.	2022-02-19 14:45:59 +01:00
Linus Groh	06948df393	LibWeb: Fail gracefully when reaching the unimplemented part of the AAA Pages such as https://html5test.com are testing all sorts of weird, incomplete, and wrong HTML but can be useful or at least interesting for development - let's try to avoid crashing the process.	2022-02-15 23:24:34 +01:00
Linus Groh	892f6394b8	LibWeb: Implement state switch for "[CDATA[" in HTML parser	2022-02-15 23:24:34 +01:00
Linus Groh	3f7086f91a	LibWeb: Add an optional pointer to an HTMLParser to the HTMLTokenizer This is needed to access the 'adjusted current node' in the 'Markup declaration open state'. We don't want to create a full parser for something like syntax highlighting, so it's optional (null) by default.	2022-02-15 23:24:34 +01:00
Linus Groh	9130ecfd5e	LibWeb: Remove unused HTMLParser function declaration There is no implementation of this function: HTMLParser::stack_of_open_elements_has_element_with_tag_name_in_scope	2022-02-15 23:24:34 +01:00
Linus Groh	f61fb08492	LibWeb: Add spec links to each HTML tokenizer state section I didn't add full spec comments this time, but this is better than nothing :^)	2022-02-15 23:24:34 +01:00
Andreas Kling	1347c5032b	LibWeb: Add spec comments to the StackOfOpenElements class	2022-02-15 02:05:53 +01:00
Andreas Kling	5cdbea4ae0	LibWeb: Rename element_before() => element_immediately_above() This matches the spec terminology around the "stack of open elements".	2022-02-15 02:05:53 +01:00
Andreas Kling	6fe333607d	LibWeb: Add spec comments to find_appropriate_place_for_inserting_node()	2022-02-15 02:05:53 +01:00
Karol Kosek	c157c2148f	LibWeb: Don't emit current token on EOF in HTML Tokenizer Emitting tokens on EOF caused an infinite loop, freezing the app, which could be a bit annoying when writing an HTML comment at the end of the file in Text Editor. :^)	2022-02-14 12:50:44 +03:30
Karol Kosek	fb5e2670d6	LibWeb: Fix highlighting HTML comments Commit `b193351a99` caused the HTML comments to flash when changing the text cursor. Also, when double-clicking on a comment, the selection started from the beginning of the file instead. The following message was displaying when `TOKENIZER_TRACE_DEBUG` was enabled: (Tokenizer::nth_last_position) Invalid position requested: 4th-last of 4. Returning (0-0). Changing the `nth_last_position` to 3 fixes this. I'm guessing that's because the parser is at that moment on the second hyphen of the `<!--` string, so it has to go back only by three characters.	2022-02-14 12:50:44 +03:30
MacDue	b193351a99	LibWeb: Fix off-by-one in HTMLTokenizer::restore_to() The difference should be between m_utf8_iterator and the the new position, if m_prev_utf8_iterator is used one fewer source position is popped than required. This issue was not apparent on most pages since restore_to used for tokens such <!doctype> that are normally followed by a newline that resets the column to zero, but it can be seen on pages with minified HTML.	2022-02-13 14:51:09 +00:00
Luke Wilde	f71f404e0c	LibWeb: Introduce the Environment Settings Object The environment settings object is effectively the context a piece of script is running under, for example, it contains the origin, responsible document, realm, global object and event loop for the current context. This effectively replaces ScriptExecutionContext, but it cannot be removed in this commit as EventTarget still depends on it. https://html.spec.whatwg.org/multipage/webappapis.html#environment-settings-object	2022-02-08 17:47:44 +00:00
Sam Atkins	197759e30f	LibWeb: Fix off-by-one error when highlighting unquoted HTML attributes This fixes #11166	2021-12-10 21:27:13 +01:00
Sam Atkins	7196570f9b	LibWeb: Cast unused smart-pointer return values to void	2021-12-05 15:31:03 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Timothy Flynn	e01dfaac9a	LibWeb: Implement Attribute closer to the spec and with an IDL file Note our Attribute class is what the spec refers to as just "Attr". The main differences between the existing implementation and the spec are just that the spec defines more fields. Attributes can contain namespace URIs and prefixes. However, note that these are not parsed in HTML documents unless the document content-type is XML. So for now, these are initialized to null. Web pages are able to set the namespace via JavaScript (setAttributeNS), so these fields may be filled in when the corresponding APIs are implemented. The main change to be aware of is that an attribute is a node. This has implications on how attributes are stored in the Element class. Nodes are non-copyable and non-movable because these constructors are deleted by the EventTarget base class. This means attributes cannot be stored in a Vector or HashMap as these containers assume copyability / movability. So for now, the Vector holding attributes is changed to hold RefPtrs to attributes instead. This might change when attribute storage is implemented according to the spec (by way of NamedNodeMap).	2021-10-17 13:51:10 +01:00
Brian Gianforcaro	7defb893a9	LibWeb: Remove dead "outer loop" code in adoption agency algorithm	2021-10-10 13:48:04 +02:00
Luke Wilde	c0a64f7317	LibWeb: Check for HTML integration points in the tree constructor This particularly implements these two points: - "If the adjusted current node is an HTML integration point and the token is a start tag" - "If the adjusted current node is an HTML integration point and the token is a character token" This also adds spec comments to the tree constructor.	2021-10-01 12:26:41 +02:00
Andreas Kling	831fdcaabc	LibWeb: Add the PageTransitionEvent interface and fire "pageshow" events We now fire "pageshow" events at the appropriate time during document loading (done by the parser.) Note that there are no corresponding "pagehide" events yet.	2021-09-26 12:47:51 +02:00
Andreas Kling	508edcd217	LibWeb: Add a "page showing" flag to documents This will be used to determine whether "pageshow" and "pagehide" events are appropriate. We won't actually make use of it until we implement more of history traversal and document unloading.	2021-09-26 12:47:51 +02:00
Andreas Kling	a2f77a2e39	LibWeb: Implement "update the current document readiness" from spec The only difference from what we were already doing is that setting the same ready state twice no longer fires a "readystatechange" event. I don't think that could happen in practice though.	2021-09-26 12:47:51 +02:00
Andreas Kling	8496024756	LibWeb: Store HTML document ready state as an enum	2021-09-26 12:47:51 +02:00
Andreas Kling	dbba0a520f	LibWeb: Allow HTML parser to delay delivery of the document "load" event We will now spin in "the end" until there are no more "things delaying the load event". Of course, nothing actually uses this yet, and there are a lot of things that need to.	2021-09-26 02:00:00 +02:00
Andreas Kling	e7af6af626	LibWeb: Implement more of HTMLParser::the_end() and bring closer to spec	2021-09-26 00:52:19 +02:00
Andreas Kling	e452550fda	LibWeb: Split out "The end" from the HTML parsing spec to a function Also add a spec link and some comments.	2021-09-26 00:04:33 +02:00
Andreas Kling	f67648f872	LibWeb: Rename HTMLDocumentParser => HTMLParser	2021-09-25 23:36:43 +02:00
Ben Wiederhake	32e98d0924	Libraries: Use AK::Variant default initialization where appropriate	2021-09-21 04:22:52 +04:30
Andreas Kling	c34da16089	LibWeb: Make <script src> loads partially async (by following the spec) Instead of firing up a network request and synchronously blocking for it to finish via a nested event loop, we now start an asynchronous request when encountering <script src>. Once the script load finishes (or fails), it gets executed at one of the synchronization points in the HTML parser. This solves some long-standing issues with random unexpected events getting dispatched in the middle of parsing.	2021-09-20 17:22:25 +02:00
Andreas Kling	e11ae33c66	LibWeb: Pop entire stack of open elements at the end of parsing	2021-09-20 17:22:25 +02:00
Andreas Kling	cb895edad4	LibWeb: Move Attribute into the DOM namespace	2021-09-16 01:39:47 +02:00
Andreas Kling	70398645f3	LibWeb: Improvements to error handling in HTML foreign content parsing Follow the spec more closely when encountering an invalid start or end tag during foreign content parsing.	2021-09-14 23:49:45 +02:00
Luke Wilde	f62477c093	LibWeb: Implement HTML fragment serialisation and use it in innerHTML The previous implementation was about a half implementation and was tied to Element::innerHTML. This separates it and puts it into HTMLDocumentParser, as this is in the parsing section of the spec. This provides a near finished HTML fragment serialisation algorithm, bar namespaces in attributes and the `is` value.	2021-09-14 02:09:18 +02:00
Idan Horowitz	4629f2e4ad	LibWeb: Add the Web::URL namespace and move URLEncoder to it This namespace will be used for all interfaces defined in the URL specification, like URL and URLSearchParams. This has the unfortunate side-effect of requiring us to use the fully qualified AK::URL name whenever we want to refer to the AK class, so this commit also fixes all such references.	2021-09-13 01:43:10 +02:00

1 2 3

111 commits