Commit graph

21 commits

Author SHA1 Message Date
MacDue
fc41c282ec LibWeb: Fix utf16-be check in HTMLEncodingDetection
The utf-16be check mistakenly skipped index 3, so was not checking the
correct bytes. This meant UTF16-BE files could fail to decode.
2024-01-08 23:35:09 +01:00
MacDue
5e973fca0b LibWeb: Prevent OOB access in HTMLEncodingDetection for input of '</'
Previously, this never checked if `position + 2` was valid. This
slightly reorders the loop so all indices are checked.

Fixes #22163
2024-01-08 23:35:09 +01:00
Andreas Kling
9ce267944c LibWeb: Fix crash in HTML encoding detection when handling non-ASCII
The fix here was to stop using StringBuilder::append(char) when told to
append a code point, and switch to StringBuilder::append_code_point(u32)

There's probably a bunch more issues like this, and we should stop using
append(char) in general since it allows building of garbage strings.
2023-12-30 13:49:50 +01:00
Andreas Kling
83f43310fa LibWeb: Add spec comments and fixups to "get an attribute" prescan algo
In particular, make some minor adjustments so it flows a little more
like the spec.
2023-12-30 13:49:50 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Shannon Booth
3bd04d2c58 LibWeb: Port Attr interface from DeprecatedString to String
There are an unfortunate number of DeprecatedString conversions required
here, but these should all fall away and look much more pretty again
when other places are also ported away from DeprecatedString.

Leaves only the Element IDL interface left :^)
2023-09-25 15:39:29 +02:00
Andreas Kling
72c9f56c66 LibJS: Make Heap::allocate<T>() infallible
Stop worrying about tiny OOMs. Work towards #20449.

While going through these, I also changed the function signature in many
places where returning ThrowCompletionOr<T> is no longer necessary.
2023-08-13 15:38:42 +02:00
Kenneth Myhra
50c5f0d7da LibWeb: Make factory method of DOM::Attr fallible 2023-02-18 00:52:47 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
fb21271334 LibWeb: Replace incorrect uses of AK::is_ascii_space() 2022-10-02 21:32:49 +02:00
Andreas Kling
530675993b LibWeb: Rename Attribute to Attr
This name is not very good, but it's what the specification calls it.
2022-09-18 02:08:01 +02:00
Andreas Kling
6f433c8656 LibWeb+LibJS: Make the EventTarget hierarchy (incl. DOM) GC-allocated
This is a monster patch that turns all EventTargets into GC-allocated
PlatformObjects. Their C++ wrapper classes are removed, and the LibJS
garbage collector is now responsible for their lifetimes.

There's a fair amount of hacks and band-aids in this patch, and we'll
have a lot of cleanup to do after this.
2022-09-06 00:27:09 +02:00
sin-ack
c8585b77d2 Everywhere: Replace single-char StringView op. arguments with chars
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Hendiadyoin1
6a95df2526 LibTextCodec: Don't allocate Strings on encoding normalisation
This ripples down to LibWeb's HTML and XHR decoders, which therefore
become less allocation heavy.
2022-03-21 10:48:17 +01:00
Timothy Flynn
e01dfaac9a LibWeb: Implement Attribute closer to the spec and with an IDL file
Note our Attribute class is what the spec refers to as just "Attr". The
main differences between the existing implementation and the spec are
just that the spec defines more fields.

Attributes can contain namespace URIs and prefixes. However, note that
these are not parsed in HTML documents unless the document content-type
is XML. So for now, these are initialized to null. Web pages are able to
set the namespace via JavaScript (setAttributeNS), so these fields may
be filled in when the corresponding APIs are implemented.

The main change to be aware of is that an attribute is a node. This has
implications on how attributes are stored in the Element class. Nodes
are non-copyable and non-movable because these constructors are deleted
by the EventTarget base class. This means attributes cannot be stored in
a Vector or HashMap as these containers assume copyability / movability.
So for now, the Vector holding attributes is changed to hold RefPtrs to
attributes instead. This might change when attribute storage is
implemented according to the spec (by way of NamedNodeMap).
2021-10-17 13:51:10 +01:00
Andreas Kling
cb895edad4 LibWeb: Move Attribute into the DOM namespace 2021-09-16 01:39:47 +02:00
Luke
e9eae9d880 LibWeb: Add extracting character encoding from a meta content attribute
Some Gmail emails contain this.
2021-07-13 20:23:44 +02:00
Max Wipfli
f808279769 LibWeb: Implement encoding sniffing algorithm
This patch implements the HTML specification's "encoding sniffing
algorithm", which is used when no encoding can be obtained from the
Content-Type header (either because it doesn't contain a charset=...)
value or the file has not been opened via HTTP (as with local files).

It also modifies the creator of the HTMLDocumentParser to use the new
HTMLDocumentParser::create_with_uncertain_encoding static method, which
runs the encoding sniffing algorithm before instantiating the parser.

This now allows us to load local HTML pages (or remote pages without a
charset specified in the 'Content-Type' header) with a non-UTF-8
encoding such as 'windows-1252'. This would previously crash the
browser. :^)
2021-05-18 21:02:07 +02:00