0ct0pu5/ladybird

Author	SHA1	Message	Date
Luke Warlow	ce8d3d17c4	LibWeb: Implement unsafe HTML parsing methods Both Element's and ShadowRoot's setHTMLUnsafe, and Document's static parseHTMLUnsafe methods are implemented.	2024-06-26 06:13:29 +02:00
Andreas Kling	e62db9c118	LibWeb: Update HTML fragment serialization for declarative shadow DOM	2024-06-25 19:22:35 +02:00
Andreas Kling	9eb4b91168	LibWeb: Parse declarative shadow DOM template elements We now honor the shadowrootmode attribute on template elements while parsing, and instantiate a shadow tree as required by the spec.	2024-06-25 19:22:35 +02:00
Andreas Kling	870a954e11	LibWeb: Implement Element.outerHTML This piggybacks on the same fragment serialization code that innerHTML uses, but instead of constructing an imaginary parent element like the spec asks us to, we just add a separate serialization mode that includes the context element in the serialized markup. This makes the image carousel on https://utah.edu/ show up :^)	2024-04-09 18:17:14 -04:00
Timothy Flynn	48fb343230	LibWeb: Change HTMLParser's factory to accept the encoding as StringView No need to force an allocation. This makes a future patch a bit simpler, where we will have the encoding as a String. With this patch, we won't have to convert it to a ByteString.	2024-04-04 11:23:21 +02:00
Shannon Booth	e800605ad3	AK+LibURL: Move AK::URL into a new URL library This URL library ends up being a relatively fundamental base library of the system, as LibCore depends on LibURL. This change has two main benefits: * Moving AK back more towards being an agnostic library that can be used between the kernel and userspace. URL has never really fit that description - and is not used in the kernel. * URL _should_ depend on LibUnicode, as it needs punnycode support. However, it's not really possible to do this inside of AK as it can't depend on any external library. This change brings us a little closer to being able to do that, but unfortunately we aren't there quite yet, as the code generators depend on LibCore.	2024-03-18 14:06:28 -04:00
Shannon Booth	9ce8189f21	Everywhere: Use unqualified AK::URL Now possible in LibWeb now that there is no longer a Web::URL.	2024-02-25 08:54:31 +01:00
Timothy Flynn	af57bd5cca	LibWeb: Stop parsing after `document.write` at the insertion point If a call to `document.write` inserts an incomplete HTML tag, e.g.: document.write("<p"); we would previously continue parsing the document until we reached a closing angle bracket. However, the spec states we should stop once we reach the new insertion point.	2024-02-20 17:04:36 +01:00
Bastiaan van der Plaat	a681429dff	LibWeb: Remove DOM element deprecated_get_attribute()	2024-01-19 13:12:54 -07:00
Sam Atkins	6ffda5f271	LibWeb: Make HTMLParser::the_end() callable from outside This is a little awkward: The spec requires when loading media documents or ones that don't have a DOM, that we "act as if the user agent had stopped parsing document" which means following this algorithm. Only a few steps require an HTMLParser, but those that do, involve reaching into its internals. The simplest solution I could think of (other than duplicating this fairly hefty function) is making it static and taking a Document and optional HTMLParser as parameters.	2023-12-26 18:35:29 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Andreas Kling	bfd354492e	LibWeb: Put most LibWeb GC objects in type-specific heap blocks With this change, we now have ~1200 CellAllocators across both LibJS and LibWeb in a normal WebContent instance. This gives us a minimum heap size of 4.7 MiB in the scenario where we only have one cell allocated per type. Of course, in practice there will be many more of each type, so the effective overhead is quite a bit smaller than that in practice. I left a few types unconverted to this mechanism because I got tired of doing this. :^)	2023-11-19 22:00:48 +01:00
Shannon Booth	a8fd4fab00	LibWeb: Port HTMLParser::serialize_html_fragment from DeprecatedString	2023-11-11 08:50:25 +01:00
Shannon Booth	8fbf72b5bf	LibWeb: Port HTMLToken prefix and namespace to Optional<FlyString> Previously these were DeprecatedStrings that contained a null state. After the null state was removed, the nullability of these members was broken. This doesn't seem to cause any problems currently as the HTML parser is not inserting attributes with their full qualified name, but after we fix that problem, this bug surfaces.	2023-11-05 11:16:16 +00:00
Andreas Kling	6b20a109c6	LibWeb: Pass DOM namespace strings as FlyString in more places	2023-11-04 21:28:30 +01:00
Shannon Booth	9303e9e76f	LibWeb: Port Element::local_name and TagNames from Deprecated String Which pretty much needs to be done together due to the amount of places where they are compared together. This also involves porting over StackOfOpenElements over to FlyString from DeprecatedFly string to prevent a gazillion calls to `.to_deprecated_fly_string` calls in HTMLParser.	2023-10-03 14:47:53 +01:00
Shannon Booth	bc54560e59	LibWeb: Add Web::HTML::parse_legacy_color_value This function follows the "rules for parsing a legacy color value" which is used in some legacy attributes, such as 'bgcolor' in the body element.	2023-05-28 13:24:37 +02:00
Matthew Olsson	c0b2fa74ac	LibWeb: Fix a few const-ness issues	2023-03-06 13:05:43 +00:00
Timothy Flynn	f3db548a3d	AK+Everywhere: Rename FlyString to DeprecatedFlyString DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.	2023-01-09 23:00:24 +00:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Andreas Kling	6e0f80fbe0	LibWeb: Make the HTMLParser GC-allocated This prevents a reference cycle between a HTMLParser opened via document.open() and the document. It was one of many things keeping some documents alive indefinitely.	2022-10-20 15:16:23 +02:00
Luke Wilde	7b8a6b8e7a	LibWeb: Set HTMLParser::m_scripting_enabled as according to the spec This allows <noscript> elements to display their content as proper HTML instead of raw text when scripting is disabled.	2022-09-23 22:25:09 +01:00
Andreas Kling	ab8432783e	LibWeb: Implement aborting the HTML parser This is roughly on-spec, although I had to invent a simple "aborted" state for the tokenizer.	2022-09-20 23:44:59 +02:00
Andreas Kling	6f433c8656	LibWeb+LibJS: Make the EventTarget hierarchy (incl. DOM) GC-allocated This is a monster patch that turns all EventTargets into GC-allocated PlatformObjects. Their C++ wrapper classes are removed, and the LibJS garbage collector is now responsible for their lifetimes. There's a fair amount of hacks and band-aids in this patch, and we'll have a lot of cleanup to do after this.	2022-09-06 00:27:09 +02:00
Andreas Kling	1956c52c68	LibWeb: Remove unused HTML::parse_html_document()	2022-04-06 19:35:07 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Andreas Kling	fda25f9505	LibWeb: Move HTML dimension value parsing from CSS to HTML namespace These are part of HTML, not CSS, so let's not confuse things.	2022-03-26 17:31:01 +01:00
Simon Wanner	1d95745901	LibWeb: Implement the rest of the Adoption Agency Algorithm This gets us 2 points on html5test.com :^) - Before: https://html5te.st/4cf57659bc08272e (208) - After: https://html5te.st/fb8a9259bda1c115 (210)	2022-03-20 02:52:37 +01:00
Luke Wilde	46c0d0f7ae	LibWeb: Associate form elements with a form in parsing and dynamically This makes it available for all form associated elements and not just select and input elements. It also makes it more spec compliant, especially around the form attribute. The main thing missing is re-associating form elements with a form attribute when the form attribute changes or an element with an ID is inserted/removed or has its ID changed.	2022-03-01 23:19:41 +01:00
Andreas Kling	8b2499b112	LibWeb: Make document.write() work while document is parsing This necessitated making HTMLParser ref-counted, and having it register itself with Document when created. That makes it possible for scripts to add new input at the current parser insertion point. There is now a reference cycle between Document and HTMLParser. This cycle is explicitly broken by calling Document::detach_parser() at the end of HTMLParser::run(). This is a huge progression on ACID3, from 31% to 49%! :^)	2022-02-21 22:00:28 +01:00
Lorenz Steinert	db789813c9	LibWeb: Add basic support for dynamic markup insertion This implements basic support for dynamic markup insertion, adding * Document::open() * Document::write(Vector<String> const&) * Document::writeln(Vector<String> const&) * Document::close() The HTMLParser is modified to make it possible to create a script-created parser which initially only contains a HTMLTokenizer without any data. Aditionally the HTMLParser::run method gains an overload which does not modify the Document and does not run HTMLParser::the_end() so that we can reenter the parser at a later time. Furthermore all FIXMEs that consern the insertion point are implemented wich is defined in the HTMLTokenizer. Additionally the following member-variables of the HTMLParser are now exposed by getter funcions: * m_tokenizer * m_aborted * m_script_nesting_level The HTMLTokenizer is modified so that it contains an insertion point which keeps track of where the next input from the Document::write functions will be inserted. The insertion point is implemented as the charakter offset into m_decoded_input and a boolean describing if the insertion point is defined. Functions to update, check and {re}store the insertion point are also added. The function HTMLTokenizer::insert_eof is added to tell a script-created parser that document::close was called and HTMLParser::the_end() should be called. Lastly an explicit default constructor is added to HTMLTokenizer to create a empty HTMLTokenizer into which data can be inserted.	2022-02-21 18:26:43 +01:00
Linus Groh	892f6394b8	LibWeb: Implement state switch for "[CDATA[" in HTML parser	2022-02-15 23:24:34 +01:00
Linus Groh	9130ecfd5e	LibWeb: Remove unused HTMLParser function declaration There is no implementation of this function: HTMLParser::stack_of_open_elements_has_element_with_tag_name_in_scope	2022-02-15 23:24:34 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Andreas Kling	e452550fda	LibWeb: Split out "The end" from the HTML parsing spec to a function Also add a spec link and some comments.	2021-09-26 00:04:33 +02:00
Andreas Kling	f67648f872	LibWeb: Rename HTMLDocumentParser => HTMLParser	2021-09-25 23:36:43 +02:00

36 commits