0ct0pu5/ladybird

Author	SHA1	Message	Date
stelar7	e547f5887e	LibWeb: Fix Array OOBs in the HTMLTokenizer Accessing last() if there are no elements makes WebContent crash :^)	2022-06-03 12:29:11 +01:00
Andreas Kling	1061c863f8	LibWeb: Fix issue where double-quoted doctype system ID was not captured We were storing double-quoted system ID's in the public ID field. 1% progression on ACID3. :^)	2022-03-02 12:30:15 +01:00
Lorenz Steinert	db789813c9	LibWeb: Add basic support for dynamic markup insertion This implements basic support for dynamic markup insertion, adding * Document::open() * Document::write(Vector<String> const&) * Document::writeln(Vector<String> const&) * Document::close() The HTMLParser is modified to make it possible to create a script-created parser which initially only contains a HTMLTokenizer without any data. Aditionally the HTMLParser::run method gains an overload which does not modify the Document and does not run HTMLParser::the_end() so that we can reenter the parser at a later time. Furthermore all FIXMEs that consern the insertion point are implemented wich is defined in the HTMLTokenizer. Additionally the following member-variables of the HTMLParser are now exposed by getter funcions: * m_tokenizer * m_aborted * m_script_nesting_level The HTMLTokenizer is modified so that it contains an insertion point which keeps track of where the next input from the Document::write functions will be inserted. The insertion point is implemented as the charakter offset into m_decoded_input and a boolean describing if the insertion point is defined. Functions to update, check and {re}store the insertion point are also added. The function HTMLTokenizer::insert_eof is added to tell a script-created parser that document::close was called and HTMLParser::the_end() should be called. Lastly an explicit default constructor is added to HTMLTokenizer to create a empty HTMLTokenizer into which data can be inserted.	2022-02-21 18:26:43 +01:00
Adam Hodgen	b6eaefa87d	LibWeb: Fix 'Comment end state' in HTML Tokenizer Also, update the expected hash in the LibWeb TestHTMLTokenizer regression test. This is due to the "This comment has a few too many dashes." comment token being updated.	2022-02-21 16:31:45 +01:00
Adam Hodgen	d73bb2633c	LibWeb: Implement tokenization newline preprocessing Newline normalization will replace \r and \r\n with \n. The spec specifically states > Before the tokenization stage, the input stream must be preprocessed > by normalizing newlines. wheras this is implemented the processing during the tokenization itself. This should still exhibit the same behaviour, while keeping the tokenization logic in the same place.	2022-02-21 16:31:45 +01:00
Adam Hodgen	c6fcdd0f93	LibWeb: Fix off by one error in HTML Tokenizer In 'NamedCharacterReference' we attempt to lookup the code point by a identifier, eg apos; becomes ' This is done by passing the entire rest of the document to the `HTML::code_points_from_entity` function. However, before this change we didn't sent the final character which meant if the document ended in a named character reference the lookup would fail.	2022-02-21 16:31:45 +01:00
Andreas Kling	25504f6a1b	LibWeb: Use Vector::clear_with_capacity() in HTMLTokenizer This avoids constantly reallocating the Vector<HTMLToken>.	2022-02-19 14:45:59 +01:00
Linus Groh	892f6394b8	LibWeb: Implement state switch for "[CDATA[" in HTML parser	2022-02-15 23:24:34 +01:00
Linus Groh	f61fb08492	LibWeb: Add spec links to each HTML tokenizer state section I didn't add full spec comments this time, but this is better than nothing :^)	2022-02-15 23:24:34 +01:00
Karol Kosek	c157c2148f	LibWeb: Don't emit current token on EOF in HTML Tokenizer Emitting tokens on EOF caused an infinite loop, freezing the app, which could be a bit annoying when writing an HTML comment at the end of the file in Text Editor. :^)	2022-02-14 12:50:44 +03:30
Karol Kosek	fb5e2670d6	LibWeb: Fix highlighting HTML comments Commit `b193351a99` caused the HTML comments to flash when changing the text cursor. Also, when double-clicking on a comment, the selection started from the beginning of the file instead. The following message was displaying when `TOKENIZER_TRACE_DEBUG` was enabled: (Tokenizer::nth_last_position) Invalid position requested: 4th-last of 4. Returning (0-0). Changing the `nth_last_position` to 3 fixes this. I'm guessing that's because the parser is at that moment on the second hyphen of the `<!--` string, so it has to go back only by three characters.	2022-02-14 12:50:44 +03:30
MacDue	b193351a99	LibWeb: Fix off-by-one in HTMLTokenizer::restore_to() The difference should be between m_utf8_iterator and the the new position, if m_prev_utf8_iterator is used one fewer source position is popped than required. This issue was not apparent on most pages since restore_to used for tokens such <!doctype> that are normally followed by a newline that resets the column to zero, but it can be seen on pages with minified HTML.	2022-02-13 14:51:09 +00:00
Sam Atkins	197759e30f	LibWeb: Fix off-by-one error when highlighting unquoted HTML attributes This fixes #11166	2021-12-10 21:27:13 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Andreas Kling	f67648f872	LibWeb: Rename HTMLDocumentParser => HTMLParser	2021-09-25 23:36:43 +02:00
ovf	898b8ffcb6	LibWeb: Avoid assertion failure on parsing numeric character references	2021-07-28 18:32:22 +02:00
ovf	13c7d55320	LibWeb: Fix parsing of character references in attribute values	2021-07-27 00:03:43 +02:00
Max Wipfli	ccae0cae45	LibWeb: Rename HTMLToken::doctype_data() => ensure_doctype_data() This renames the accessor to better reflect what it does, as this will allocate a DoctypeData struct if there is none.	2021-07-17 16:24:57 +04:30
Max Wipfli	25cba4387b	LibWeb: Add HTMLToken(Type) constructor and use it	2021-07-17 16:24:57 +04:30
Max Wipfli	f2e3c770f9	LibWeb: Use setter for HTMLToken::m_{start,end}_position	2021-07-17 16:24:57 +04:30
Max Wipfli	8b31e41692	LibWeb: Change HTMLToken::m_doctype into named DoctypeData struct This is in preparation for an upcoming storage change of HTMLToken. In contrast to the other token types, the accessor can hand out a mutable reference to allow users to change parts of the DoctypeData easily.	2021-07-17 16:24:57 +04:30
Max Wipfli	918bde98b1	LibWeb: Hide implementation details of HTMLToken attribute list Previously, HTMLToken would expose the Vector<Attribute> directly to its users. In preparation for a future change, all users now use implementation-agnostic APIs which do not expose the Vector directly.	2021-07-17 16:24:57 +04:30
Max Wipfli	15d8635afc	LibWeb: User getter+setter for HTMLToken tag name and self-closing flag	2021-07-17 16:24:57 +04:30
Max Wipfli	1aeafcc58b	LibWeb: Use getter and setter for Character type HTMLTokens While storing the code point in a UTF-8 encoded String in horrendously inefficient, this problem will be addressed at a later stage.	2021-07-17 16:24:57 +04:30
Max Wipfli	e8e9426b4f	LibWeb: User getter and setter for Comment type HTMLTokens	2021-07-17 16:24:57 +04:30
Max Wipfli	f886aa15b8	LibWeb: Rename HTMLToken::AttributeBuilder struct to Attribute This does not contain StringBuilders anymore, so it can do with a simpler name: Attribute.	2021-07-17 16:24:57 +04:30
Max Wipfli	e22a34badb	LibWeb: Fix assertion failures in HTMLTokenizer The *TagName states are all very similar, so it seems to be correct to apply the fix from #8761 to all of those states. This fixes #8788.	2021-07-16 11:55:55 +02:00
Max Wipfli	2404ad6897	LibWeb: Fix assertion failure when tokenizing JS regex literals This fixes parsing the following regular expression: /</g; It also adds a simple script element to the HTMLTokenizer regression test, which also contains that specific regex.	2021-07-15 01:47:22 +02:00
Max Wipfli	bb2aed7d76	LibWeb: Correct behavior of Comment* states in HTMLTokenizer Previously, this would lead to assertion failures when parsing HTML comments. This fixes #8757.	2021-07-15 00:48:45 +02:00
Max Wipfli	af0b483123	LibWeb: VERIFY an empty builder when emitting tokens in HTMLTokenizer	2021-07-15 00:48:45 +02:00
Max Wipfli	125982943a	LibWeb: Change HTMLTokenizer.{cpp,h} to east const style	2021-07-14 23:03:36 +02:00
Gunnar Beutner	300823c314	LibWeb: Use move() when enqueuing tokens in HTMLTokenizer We're not using the current token anymore once it's enqueued so let's use move() when enqueuing the tokens.	2021-07-14 23:03:36 +02:00
Gunnar Beutner	c3ad8e9a52	LibWeb: Remove StringBuilder from HTMLToken::m_comment_or_character	2021-07-14 23:03:36 +02:00
Gunnar Beutner	3aa202c432	LibWeb: Remove StringBuilder from HTMLToken::m_tag	2021-07-14 23:03:36 +02:00
Gunnar Beutner	901d71148b	LibWeb: Remove StringBuilders from HTMLToken::AttributeBuilder	2021-07-14 23:03:36 +02:00
Gunnar Beutner	992964aa7d	LibWeb: Remove StringBuilders from HTMLToken::m_doctype	2021-07-14 23:03:36 +02:00
Gunnar Beutner	d9e52997e2	LibWeb: Use an Optional<String> to track the last HTML start tag Using an HTMLToken object here is unnecessary because the only attribute we're interested in is the tag_name.	2021-07-14 23:03:36 +02:00
Andreas Kling	dc65f54c06	AK: Rename Vector::append(Vector) => Vector::extend(Vector) Let's make it a bit more clear when we're appending the elements from one vector to the end of another vector.	2021-06-12 13:24:45 +02:00
Max Wipfli	282a623853	LibWeb: Change a few source end positions in HTMLTokenizer This patch aims to fix wrong highlighting for some cases in HTML's syntax highlighter. The values were somewhat experimentally determined are are subject to change. Regardless, it should be more correct with this patch than without it. :^)	2021-06-05 00:32:28 +04:30
Max Wipfli	932161e581	LibWeb: Be more forgiving when adding source positions in HTMLTokenizer This patch changes HTMLTokenizer::nth_last_position to not fail if the requested position is not available. Rather, it will just return (0-0). While this is not the correct solution, it prevents the tokenizer from crashing just because it cannot find a source position. This should only affect SyntaxHighlighter.	2021-06-05 00:32:28 +04:30
Max Wipfli	bc8d16ad28	Everywhere: Replace ctype.h to avoid narrowing conversions This replaces ctype.h with CharacterType.h everywhere I could find issues with narrowing conversions. While using it will probably make sense almost everywhere in the future, the most critical places should have been addressed.	2021-06-03 13:31:46 +02:00
Andreas Kling	407d6cd9e4	AK: Rename Utf8CodepointIterator => Utf8CodePointIterator	2021-06-01 09:45:52 +02:00
Ali Mohammad Pur	1822d6b8ac	LibWeb: Fix invalid behaviour of HTMLTokenizer::skip() and restore_to() skip() is supposed to end up keeping the previous iterator only one index behind the current one, and restore_to() should actually do the restore instead of just removing the now-useless source positions. Fixes #7331.	2021-05-21 09:22:35 +02:00
Ali Mohammad Pur	97a230e4ef	LibWeb: Add a super basic HTML syntax highlighter This can currently highlight tag names and attribute names/values.	2021-05-20 22:06:45 +02:00
Ali Mohammad Pur	aa7939bc6c	LibWeb: Add position tracking information to HTML tokens	2021-05-20 22:06:45 +02:00
Brian Gianforcaro	6d69c97b99	LibWeb: Utilize SourceLocation for HTMLTokenizer logging	2021-04-25 09:32:03 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00
AnotherTest	09a43969ba	Everywhere: Replace dbgln<flag>(...) with dbgln_if(flag, ...) Replacement made by `find Kernel Userland -name '.h' -o -name '.cpp' \| sed -i -Ee 's/dbgln\b<(\w+)>\(/dbgln_if(\1, /g'`	2021-02-08 18:08:55 +01:00
asynts	8465683dcf	Everywhere: Debug macros instead of constexpr. This was done with the following script: find . \( -name '.cpp' -o -name '.h' -o -name '.in' \) -not -path './Toolchain/' -not -path './Build/' -exec sed -i -E 's/dbgln<debug_([a-z_]+)>/dbgln<\U\1_DEBUG>/' {} \; find . \( -name '.cpp' -o -name '.h' -o -name '.in' \) -not -path './Toolchain/' -not -path './Build/' -exec sed -i -E 's/if constexpr \(debug_([a-z0-9_]+)/if constexpr \(\U\1_DEBUG/' {} \;	2021-01-25 09:47:36 +01:00

1 2

53 commits