0ct0pu5/ladybird

Author	SHA1	Message	Date
Linus Groh	6e7459322d	AK: Remove StringBuilder::build() in favor of to_deprecated_string() Having an alias function that only wraps another one is silly, and keeping the more obvious name should flush out more uses of deprecated strings. No behavior change.	2023-01-27 20:38:49 +00:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Andreas Kling	c79e8aab0a	LibWeb: Make ON_WHITESPACE less heavy in HTML tokenizer Once we know that the current code point is an ASCII character, we can just check if it's one of the HTML whitespace characters. Before this patch, we were using the generic StringView::contains(u32) path that splats a code point into a StringBuilder and then searches for it with memmem(). This reduces time spent in the HTML tokenizer from 16% to 6% when loading the ECMA-262 spec.	2022-11-05 00:31:11 +01:00
Andreas Kling	ab8432783e	LibWeb: Implement aborting the HTML parser This is roughly on-spec, although I had to invent a simple "aborted" state for the tokenizer.	2022-09-20 23:44:59 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
stelar7	e547f5887e	LibWeb: Fix Array OOBs in the HTMLTokenizer Accessing last() if there are no elements makes WebContent crash :^)	2022-06-03 12:29:11 +01:00
Andreas Kling	1061c863f8	LibWeb: Fix issue where double-quoted doctype system ID was not captured We were storing double-quoted system ID's in the public ID field. 1% progression on ACID3. :^)	2022-03-02 12:30:15 +01:00
Lorenz Steinert	db789813c9	LibWeb: Add basic support for dynamic markup insertion This implements basic support for dynamic markup insertion, adding * Document::open() * Document::write(Vector<String> const&) * Document::writeln(Vector<String> const&) * Document::close() The HTMLParser is modified to make it possible to create a script-created parser which initially only contains a HTMLTokenizer without any data. Aditionally the HTMLParser::run method gains an overload which does not modify the Document and does not run HTMLParser::the_end() so that we can reenter the parser at a later time. Furthermore all FIXMEs that consern the insertion point are implemented wich is defined in the HTMLTokenizer. Additionally the following member-variables of the HTMLParser are now exposed by getter funcions: * m_tokenizer * m_aborted * m_script_nesting_level The HTMLTokenizer is modified so that it contains an insertion point which keeps track of where the next input from the Document::write functions will be inserted. The insertion point is implemented as the charakter offset into m_decoded_input and a boolean describing if the insertion point is defined. Functions to update, check and {re}store the insertion point are also added. The function HTMLTokenizer::insert_eof is added to tell a script-created parser that document::close was called and HTMLParser::the_end() should be called. Lastly an explicit default constructor is added to HTMLTokenizer to create a empty HTMLTokenizer into which data can be inserted.	2022-02-21 18:26:43 +01:00
Adam Hodgen	b6eaefa87d	LibWeb: Fix 'Comment end state' in HTML Tokenizer Also, update the expected hash in the LibWeb TestHTMLTokenizer regression test. This is due to the "This comment has a few too many dashes." comment token being updated.	2022-02-21 16:31:45 +01:00
Adam Hodgen	d73bb2633c	LibWeb: Implement tokenization newline preprocessing Newline normalization will replace \r and \r\n with \n. The spec specifically states > Before the tokenization stage, the input stream must be preprocessed > by normalizing newlines. wheras this is implemented the processing during the tokenization itself. This should still exhibit the same behaviour, while keeping the tokenization logic in the same place.	2022-02-21 16:31:45 +01:00
Adam Hodgen	c6fcdd0f93	LibWeb: Fix off by one error in HTML Tokenizer In 'NamedCharacterReference' we attempt to lookup the code point by a identifier, eg apos; becomes ' This is done by passing the entire rest of the document to the `HTML::code_points_from_entity` function. However, before this change we didn't sent the final character which meant if the document ended in a named character reference the lookup would fail.	2022-02-21 16:31:45 +01:00
Andreas Kling	25504f6a1b	LibWeb: Use Vector::clear_with_capacity() in HTMLTokenizer This avoids constantly reallocating the Vector<HTMLToken>.	2022-02-19 14:45:59 +01:00
Linus Groh	892f6394b8	LibWeb: Implement state switch for "[CDATA[" in HTML parser	2022-02-15 23:24:34 +01:00
Linus Groh	f61fb08492	LibWeb: Add spec links to each HTML tokenizer state section I didn't add full spec comments this time, but this is better than nothing :^)	2022-02-15 23:24:34 +01:00
Karol Kosek	c157c2148f	LibWeb: Don't emit current token on EOF in HTML Tokenizer Emitting tokens on EOF caused an infinite loop, freezing the app, which could be a bit annoying when writing an HTML comment at the end of the file in Text Editor. :^)	2022-02-14 12:50:44 +03:30
Karol Kosek	fb5e2670d6	LibWeb: Fix highlighting HTML comments Commit `b193351a99` caused the HTML comments to flash when changing the text cursor. Also, when double-clicking on a comment, the selection started from the beginning of the file instead. The following message was displaying when `TOKENIZER_TRACE_DEBUG` was enabled: (Tokenizer::nth_last_position) Invalid position requested: 4th-last of 4. Returning (0-0). Changing the `nth_last_position` to 3 fixes this. I'm guessing that's because the parser is at that moment on the second hyphen of the `<!--` string, so it has to go back only by three characters.	2022-02-14 12:50:44 +03:30
MacDue	b193351a99	LibWeb: Fix off-by-one in HTMLTokenizer::restore_to() The difference should be between m_utf8_iterator and the the new position, if m_prev_utf8_iterator is used one fewer source position is popped than required. This issue was not apparent on most pages since restore_to used for tokens such <!doctype> that are normally followed by a newline that resets the column to zero, but it can be seen on pages with minified HTML.	2022-02-13 14:51:09 +00:00
Sam Atkins	197759e30f	LibWeb: Fix off-by-one error when highlighting unquoted HTML attributes This fixes #11166	2021-12-10 21:27:13 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Andreas Kling	f67648f872	LibWeb: Rename HTMLDocumentParser => HTMLParser	2021-09-25 23:36:43 +02:00
ovf	898b8ffcb6	LibWeb: Avoid assertion failure on parsing numeric character references	2021-07-28 18:32:22 +02:00
ovf	13c7d55320	LibWeb: Fix parsing of character references in attribute values	2021-07-27 00:03:43 +02:00
Max Wipfli	ccae0cae45	LibWeb: Rename HTMLToken::doctype_data() => ensure_doctype_data() This renames the accessor to better reflect what it does, as this will allocate a DoctypeData struct if there is none.	2021-07-17 16:24:57 +04:30
Max Wipfli	25cba4387b	LibWeb: Add HTMLToken(Type) constructor and use it	2021-07-17 16:24:57 +04:30
Max Wipfli	f2e3c770f9	LibWeb: Use setter for HTMLToken::m_{start,end}_position	2021-07-17 16:24:57 +04:30
Max Wipfli	8b31e41692	LibWeb: Change HTMLToken::m_doctype into named DoctypeData struct This is in preparation for an upcoming storage change of HTMLToken. In contrast to the other token types, the accessor can hand out a mutable reference to allow users to change parts of the DoctypeData easily.	2021-07-17 16:24:57 +04:30
Max Wipfli	918bde98b1	LibWeb: Hide implementation details of HTMLToken attribute list Previously, HTMLToken would expose the Vector<Attribute> directly to its users. In preparation for a future change, all users now use implementation-agnostic APIs which do not expose the Vector directly.	2021-07-17 16:24:57 +04:30
Max Wipfli	15d8635afc	LibWeb: User getter+setter for HTMLToken tag name and self-closing flag	2021-07-17 16:24:57 +04:30
Max Wipfli	1aeafcc58b	LibWeb: Use getter and setter for Character type HTMLTokens While storing the code point in a UTF-8 encoded String in horrendously inefficient, this problem will be addressed at a later stage.	2021-07-17 16:24:57 +04:30
Max Wipfli	e8e9426b4f	LibWeb: User getter and setter for Comment type HTMLTokens	2021-07-17 16:24:57 +04:30
Max Wipfli	f886aa15b8	LibWeb: Rename HTMLToken::AttributeBuilder struct to Attribute This does not contain StringBuilders anymore, so it can do with a simpler name: Attribute.	2021-07-17 16:24:57 +04:30
Max Wipfli	e22a34badb	LibWeb: Fix assertion failures in HTMLTokenizer The *TagName states are all very similar, so it seems to be correct to apply the fix from #8761 to all of those states. This fixes #8788.	2021-07-16 11:55:55 +02:00
Max Wipfli	2404ad6897	LibWeb: Fix assertion failure when tokenizing JS regex literals This fixes parsing the following regular expression: /</g; It also adds a simple script element to the HTMLTokenizer regression test, which also contains that specific regex.	2021-07-15 01:47:22 +02:00
Max Wipfli	bb2aed7d76	LibWeb: Correct behavior of Comment* states in HTMLTokenizer Previously, this would lead to assertion failures when parsing HTML comments. This fixes #8757.	2021-07-15 00:48:45 +02:00
Max Wipfli	af0b483123	LibWeb: VERIFY an empty builder when emitting tokens in HTMLTokenizer	2021-07-15 00:48:45 +02:00
Max Wipfli	125982943a	LibWeb: Change HTMLTokenizer.{cpp,h} to east const style	2021-07-14 23:03:36 +02:00
Gunnar Beutner	300823c314	LibWeb: Use move() when enqueuing tokens in HTMLTokenizer We're not using the current token anymore once it's enqueued so let's use move() when enqueuing the tokens.	2021-07-14 23:03:36 +02:00
Gunnar Beutner	c3ad8e9a52	LibWeb: Remove StringBuilder from HTMLToken::m_comment_or_character	2021-07-14 23:03:36 +02:00
Gunnar Beutner	3aa202c432	LibWeb: Remove StringBuilder from HTMLToken::m_tag	2021-07-14 23:03:36 +02:00
Gunnar Beutner	901d71148b	LibWeb: Remove StringBuilders from HTMLToken::AttributeBuilder	2021-07-14 23:03:36 +02:00
Gunnar Beutner	992964aa7d	LibWeb: Remove StringBuilders from HTMLToken::m_doctype	2021-07-14 23:03:36 +02:00
Gunnar Beutner	d9e52997e2	LibWeb: Use an Optional<String> to track the last HTML start tag Using an HTMLToken object here is unnecessary because the only attribute we're interested in is the tag_name.	2021-07-14 23:03:36 +02:00
Andreas Kling	dc65f54c06	AK: Rename Vector::append(Vector) => Vector::extend(Vector) Let's make it a bit more clear when we're appending the elements from one vector to the end of another vector.	2021-06-12 13:24:45 +02:00
Max Wipfli	282a623853	LibWeb: Change a few source end positions in HTMLTokenizer This patch aims to fix wrong highlighting for some cases in HTML's syntax highlighter. The values were somewhat experimentally determined are are subject to change. Regardless, it should be more correct with this patch than without it. :^)	2021-06-05 00:32:28 +04:30
Max Wipfli	932161e581	LibWeb: Be more forgiving when adding source positions in HTMLTokenizer This patch changes HTMLTokenizer::nth_last_position to not fail if the requested position is not available. Rather, it will just return (0-0). While this is not the correct solution, it prevents the tokenizer from crashing just because it cannot find a source position. This should only affect SyntaxHighlighter.	2021-06-05 00:32:28 +04:30
Max Wipfli	bc8d16ad28	Everywhere: Replace ctype.h to avoid narrowing conversions This replaces ctype.h with CharacterType.h everywhere I could find issues with narrowing conversions. While using it will probably make sense almost everywhere in the future, the most critical places should have been addressed.	2021-06-03 13:31:46 +02:00
Andreas Kling	407d6cd9e4	AK: Rename Utf8CodepointIterator => Utf8CodePointIterator	2021-06-01 09:45:52 +02:00
Ali Mohammad Pur	1822d6b8ac	LibWeb: Fix invalid behaviour of HTMLTokenizer::skip() and restore_to() skip() is supposed to end up keeping the previous iterator only one index behind the current one, and restore_to() should actually do the restore instead of just removing the now-useless source positions. Fixes #7331.	2021-05-21 09:22:35 +02:00
Ali Mohammad Pur	97a230e4ef	LibWeb: Add a super basic HTML syntax highlighter This can currently highlight tag names and attribute names/values.	2021-05-20 22:06:45 +02:00

1 2

59 commits