Commit graph

240 commits

Author SHA1 Message Date
MacDue
fc41c282ec LibWeb: Fix utf16-be check in HTMLEncodingDetection
The utf-16be check mistakenly skipped index 3, so was not checking the
correct bytes. This meant UTF16-BE files could fail to decode.
2024-01-08 23:35:09 +01:00
MacDue
5e973fca0b LibWeb: Prevent OOB access in HTMLEncodingDetection for input of '</'
Previously, this never checked if `position + 2` was valid. This
slightly reorders the loop so all indices are checked.

Fixes #22163
2024-01-08 23:35:09 +01:00
Aliaksandr Kalenik
07928129dd LibWeb: Wait until new document becomes active before running scripts
Fixes https://github.com/SerenityOS/serenity/issues/22485

With this change WebContent does not crash when `location.reload()` is
invoked but `Navigable::reload()` still not working because of spec
issue (https://github.com/whatwg/html/issues/9869) so we can't add a
test yet.
2023-12-30 19:32:31 +01:00
Andreas Kling
9ce267944c LibWeb: Fix crash in HTML encoding detection when handling non-ASCII
The fix here was to stop using StringBuilder::append(char) when told to
append a code point, and switch to StringBuilder::append_code_point(u32)

There's probably a bunch more issues like this, and we should stop using
append(char) in general since it allows building of garbage strings.
2023-12-30 13:49:50 +01:00
Andreas Kling
83f43310fa LibWeb: Add spec comments and fixups to "get an attribute" prescan algo
In particular, make some minor adjustments so it flows a little more
like the spec.
2023-12-30 13:49:50 +01:00
Sam Atkins
6ffda5f271 LibWeb: Make HTMLParser::the_end() callable from outside
This is a little awkward: The spec requires when loading media documents
or ones that don't have a DOM, that we "act as if the user agent had
stopped parsing document" which means following this algorithm. Only a
few steps require an HTMLParser, but those that do, involve reaching
into its internals. The simplest solution I could think of (other than
duplicating this fairly hefty function) is making it static and taking
a Document and optional HTMLParser as parameters.
2023-12-26 18:35:29 +01:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Bastiaan van der Plaat
b439431488 LibWeb: Allow hr elements in select and optgroup elements 2023-12-09 22:06:20 +01:00
Shannon Booth
f976ec005c LibWeb: Port DOM::Document from DeprecatedString 2023-12-02 22:54:53 +01:00
Sam Atkins
6c5450f9ce LibWeb: Report if anything is delaying load event, not the count
Some elements that delay the load event are more complicated than a
simple count will allow for. We'll implement those in a bit!
2023-12-01 10:28:02 +01:00
Andreas Kling
bfd354492e LibWeb: Put most LibWeb GC objects in type-specific heap blocks
With this change, we now have ~1200 CellAllocators across both LibJS and
LibWeb in a normal WebContent instance.

This gives us a minimum heap size of 4.7 MiB in the scenario where we
only have one cell allocated per type. Of course, in practice there will
be many more of each type, so the effective overhead is quite a bit
smaller than that in practice.

I left a few types unconverted to this mechanism because I got tired of
doing this. :^)
2023-11-19 22:00:48 +01:00
Shannon Booth
87a4a5b302 LibWeb: Remove FIXMe's for HTML attribute serialization steps
As far as I can tell all of these steps are just equivalent to using the
qualified name. Add some tests which cover some of these cases, and
remove the FIXME's.
2023-11-11 08:50:25 +01:00
Shannon Booth
96fc1741b5 LibWeb: Return an Optional<String> from HTMLToken::attribute
Move away from using a nullable StringView.
2023-11-11 08:50:25 +01:00
Shannon Booth
72bb928dd8 LibWeb: Add spec comments to HTMLParser::handle_in_body
I have been going down into a bit of a rabbit hole trying to figure out
why the namespace is not getting set up properly on certain attributes.
At one stage, I thought the issue might have been around here where
attributes were being adjusted (it is not). I started adding spec
comments to understand what was happening, and by the time I realised it
wasn't in this place, I was already in too deep!

Add a whole bunch of spec comments, and leave one or two minor FIXME's
where the spec seems to have changed since this was originally
implemented.
2023-11-11 08:50:25 +01:00
Shannon Booth
a8fd4fab00 LibWeb: Port HTMLParser::serialize_html_fragment from DeprecatedString 2023-11-11 08:50:25 +01:00
Shannon Booth
326b34c7c7 LibWeb: Port all callers of Element::namespace to Element::namespace_uri
Removing some more use of DeprecatedFlyString
2023-11-06 11:37:08 +01:00
Shannon Booth
c8a4fc6c1a LibWeb: Port HTML parser quirk public IDs to StringView
These were DeprecatedFlyStrings, but had no reason to be. We were not
making use of the O(1) lookup, so instead of porting it over to a
FlyString, just make it a StringView.
2023-11-06 11:37:08 +01:00
Shannon Booth
1f8d72da8e LibWeb: Port HTMLToken::to_deprecated_string to new AK String 2023-11-06 11:37:08 +01:00
Shannon Booth
4821d284c6 LibWeb: Add support for inline SVG element scripts 2023-11-05 11:16:16 +00:00
Shannon Booth
e5d45eeeb1 LibWeb: Properly append attributes to element when creating an Element
The main behavioural difference here is that the full qualified name is
appended to the element, rather than just the local name and value.
2023-11-05 11:16:16 +00:00
Shannon Booth
8fbf72b5bf LibWeb: Port HTMLToken prefix and namespace to Optional<FlyString>
Previously these were DeprecatedStrings that contained a null state.
After the null state was removed, the nullability of these members was
broken. This doesn't seem to cause any problems currently as the HTML
parser is not inserting attributes with their full qualified name, but
after we fix that problem, this bug surfaces.
2023-11-05 11:16:16 +00:00
Shannon Booth
fcde808308 LibWeb: Avoid copy of local_name in HTMLParser::create_element_for 2023-11-05 11:16:16 +00:00
Shannon Booth
907be5a96e LibWeb: Add spec comment for HTMLParser::adjusted_current_node
I've found myself looking at this function a bunch while debugging.
2023-11-05 11:16:16 +00:00
Andreas Kling
3ff81dcb65 LibWeb: Make Web::Namespace::Foo strings be FlyString
This required dealing with a *lot* of fallout, but it's all basically
just switching from DeprecatedFlyString to either FlyString or
Optional<FlyString> in a hundred places to accommodate the change.
2023-11-04 21:28:30 +01:00
Andreas Kling
6b20a109c6 LibWeb: Pass DOM namespace strings as FlyString in more places 2023-11-04 21:28:30 +01:00
Andreas Kling
b341aeb5c1 LibWeb: Switch HTMLToken and HTMLTokenizer to String & FlyString 2023-11-04 21:28:30 +01:00
Andreas Kling
f052823f5f LibWeb: Use FlyString for create_element() namespace strings 2023-11-04 21:28:30 +01:00
Shannon Booth
79ed72adb4 LibWeb: Port HTMLToken::make_start_tag from DeprecatedFlyString 2023-10-08 08:11:48 -04:00
Shannon Booth
7aac7002d1 LibWeb: Port SVG::TagNames from DeprecatedFlyString 2023-10-08 08:11:48 -04:00
Shannon Booth
d8635fe541 LibWeb: Port HTMLParser local name and value from DeprecatedString 2023-10-08 08:11:48 -04:00
Shannon Booth
e4f8c59210 LibWeb: Port AttributeNames to FlyString 2023-10-08 08:11:48 -04:00
Shannon Booth
4321606bba LibWeb: Port Element interface from DeprecatedString
This is the last IDL interface which was using DeprecatedString! :^)
2023-10-06 08:25:40 +02:00
Shannon Booth
ff72436448 LibWeb: Add a FlyString version of Element::tag_name
Renaming the DeprecatedString version of this function to
deprecated_tag_name. A FlyString is used here as we often need to
perform equality checks here, and the HTMLParser already has tag_name as
a FlyString.

Remove a FIXME while we're at it - we were already following the spec
there, and we still are :^)
2023-10-03 14:47:53 +01:00
Shannon Booth
9303e9e76f LibWeb: Port Element::local_name and TagNames from Deprecated String
Which pretty much needs to be done together due to the amount of places
where they are compared together.

This also involves porting over StackOfOpenElements over to FlyString
from DeprecatedFly string to prevent a gazillion calls to
`.to_deprecated_fly_string` calls in HTMLParser.
2023-10-03 14:47:53 +01:00
Shannon Booth
3bd04d2c58 LibWeb: Port Attr interface from DeprecatedString to String
There are an unfortunate number of DeprecatedString conversions required
here, but these should all fall away and look much more pretty again
when other places are also ported away from DeprecatedString.

Leaves only the Element IDL interface left :^)
2023-09-25 15:39:29 +02:00
Shannon Booth
60c32f39a1 LibWeb: Do not crash when parsing a SVG script element
Just leave a FIXME dbgln message instead. This works around a crash seen
in html5test.com.
2023-09-23 11:41:57 +02:00
Shannon Booth
6de9d2820f LibWeb: Add spec comments to 'process the rules for foreign content' 2023-09-23 11:41:57 +02:00
Shannon Booth
b603e860af LibWeb: Port CharacterData from DeprecatedString to String
The existing implementation has some pre-existing issues where it is
incorrectly assumes that byte offsets are given through the IDL instead
of UTF-16 code units. While making these changes, leave some FIXMEs for
that.
2023-09-19 10:54:07 +02:00
Shannon Booth
e74031a396 LibWeb: Port Document interface from DeprecatedString to String 2023-09-16 11:17:19 +02:00
Shannon Booth
49eb3bfb1d LibWeb: Make Document::run_the_document_write_steps take a StringView
Which flows on down into HTMLTokenizer::insert_input_at_insertion_point.
2023-09-13 07:26:35 +02:00
Shannon Booth
bcb6851c07 LibWeb: Port Text interface from DeprecatedString to String 2023-09-06 11:44:45 -04:00
Shannon Booth
cc1e4c5cb3 LibWeb: Port Comment interface from DeprecatedString to String 2023-09-06 11:44:45 -04:00
Timothy Flynn
fea440055a LibWeb: Track the byte offset of an HTMLToken's position
We currently track the [line, column] position of every HTMLToken, as
this is what is needed for LibGUI's syntax highlighting. Some non-LibGUI
purposes (e.g. highlighting HTML with HTML) require a byte offset. Track
both during tokenization.
2023-08-29 08:11:11 -04:00
Andrew Kaster
6e64bf5464 LibWeb: Remove outdated old_queue_global_event_with_document
The FIXME here describes an old constraint on JS Interpreters which no
longer holds. It hails from a time when we had the global object and
JS realm attached to the document.
2023-08-28 12:57:05 +02:00
Shannon Booth
ebdfe2e863 LibWeb: Port DocumentType from DeprecatedString to String 2023-08-27 05:34:54 +02:00
MacDue
71baa8c31a LibWeb: Add CSSPixels::nearest_value_for(FloatingPoint)
This is intended to annotate conversions from unknown floating-point
values to CSSPixels, and make it more obvious the fp value will be
rounded to the nearest fixed-point value.
2023-08-26 23:53:45 +02:00
MacDue
360c0eb509 LibWeb: Remove implicit conversion from float and double to CSSPixels
In general it is not safe to convert any arbitrary floating-point value
to CSSPixels. CSSPixels has a resolution of 0.015625, which for small
values (e.g. scale factors between 0 and 1), can produce bad results
if converted to CSSPixels then scaled back up. In the worst case values
can underflow to zero and produce incorrect results.
2023-08-26 23:53:45 +02:00
Timothy Flynn
5a2bf7fdd1 LibWeb: Set the correct end position of HTML attribute names
We were previously setting the end position of attribute names in self-
closing HTML tags to the end of the attribute value. To illustrate the
previous behavior, consider this tag and its attribute's start and end
positions (shown inclusively below):

    <meta charset="UTF-8" />
          ^ name start
                  ^ value start
                        ^ value end
                        ^ name end

Rather than setting the end position of the attribute name when we parse
the closing slash, ensure the end position is already set while we are
in the AttributeName state. We now have:

    <meta charset="UTF-8" />
          ^ name start
                ^ name end
                  ^ value start
                        ^ value end

The tokenizer unit test has been extended to test these positions.
2023-08-25 08:22:24 +02:00
Timothy Flynn
5b2bc90b50 LibWeb: Set consistent positions for the start and end of HTML tags
To illustrate the previous behavior, consider these tags and their start
and end positions (shown inclusively below):

    Start tag:    End tag:
    <span>        </span>
     ^ start       ^ start
         ^end           ^end

The start position of a tag is the first ASCII-alpha code point after
the opening brace. The start position of a close tag is the slash just
before the first ASCII-alpha code point. And the end position of both
is the closing brace. So the opening brace is not included in the
emitted tag, but the closing brace is. And the end tag including the
slash is an oddity that had to be worked around in its only use case
(syntax highlighting).

We now consistently exclude the braces from the emitted tag, and also
exclude the slash from the end tag, so that it does not need to be
accounted for in syntax highlighting. That is, we now have:

    Start tag:    End tag:
    <span>        </span>
     ^ start        ^ start
        ^end           ^end

The tokenizer unit test has been extended to test these positions.
2023-08-25 08:22:24 +02:00