Commit graph

30 commits

Author SHA1 Message Date
Sam Atkins
89c5f25016 LibWeb/CSS: Remove tiny-oom propagation from CSS Tokenizer 2024-07-26 17:29:20 +02:00
Sam Atkins
1a5533e528 LibWeb: Tokenize CSS numbers as doubles
Every later stage uses doubles, so dropping that precision right at the
start of parsing is a little silly. :^)
2023-08-20 14:25:18 +01:00
Sam Atkins
c138845013 LibWeb: Store the original representation of CSS tokens
This is required for the `<urange>` type, and custom properties, to work
correctly, as both need to know exactly what the original text was.
2023-03-22 19:45:40 +01:00
Sam Atkins
84af8dd9ed LibWeb: Propagate errors from CSS Tokenizer 2023-03-07 00:43:36 +01:00
Sam Atkins
17618989a3 LibWeb: Propagate errors from CSS Tokenizer construction
Instead of constructing a Tokenizer and then calling parse() on it, we
now call `Tokenizer::tokenize(...)` directly. (Renamed from `parse()`
because this is a Tokenizer, not a Parser.)
2023-03-07 00:43:36 +01:00
Sam Atkins
3685a8813a LibWeb: Port CSS Tokenizer to new Strings
Specifically, this uses FlyString, because the data gets held long-term
as a FlyString anyway.
2023-02-15 12:48:26 -05:00
Sam Atkins
8af65108e4 LibWeb: Construct CSS Tokenizer and Parser with a StringView encoding
This doesn't need to be a full (Deprecated)String, so let's not force it
to be.
2023-02-15 12:48:26 -05:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Sam Atkins
97e174afcd LibWeb: Use the term "ident sequence" instead of "name"
This is an editorial change in the December 2021 version of SYNTAX-3:
https://www.w3.org/TR/2021/CRD-css-syntax-3-20211224/
2022-10-03 17:09:41 +01:00
Sam Atkins
bf786d66b1 LibWeb: Move Token and Tokenizer into Parser namespace 2022-04-12 23:03:46 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Sam Atkins
fe372cd073 LibWeb: Use CSS::Number for Token numeric values 2022-03-22 15:47:36 +01:00
Sam Atkins
0795b9f7bb LibWeb: Use floats instead of doubles for CSS numbers
Using doubles isn't necessary, and they make things slightly bigger and
slower, so let's use floats instead.
2022-03-22 15:47:36 +01:00
Sam Atkins
269a24d4ca LibWeb: Pass correct values to would_start_an_identifier()
Same as with would_start_a_number(), we were skipping a code point.
2021-12-27 22:56:08 +01:00
Sam Atkins
bb82ee5530 LibWeb: Pass correct values to would_start_a_number()
This fixes the crash that Luke found using Domato:
```css
. foo {
    mso-border-alt: solid  .-1pt;
}
```

The spec distinguishes between "If the next 3 code points would
start..." and "If the input stream starts with..." but we were treating
them the same way, skipping the first code point in the process.
2021-12-27 22:56:08 +01:00
Sam Atkins
981badb45f LibWeb: Add CSS::Tokenizer::start_of_input_stream_[twin|triplet]()
These correspond to "If the input stream starts with..." in the spec,
which up until now we were not handling correctly, which led to some fun
bugs.

As noted, reconsuming the input code point in order to read its value is
hacky, but works. Keeping track of the current code point in Tokenizer
would be nicer, when I'm feeling brave enough to mess with it!
2021-12-27 22:56:08 +01:00
Sam Atkins
f6869797a7 LibWeb: Convert numeric tokens to numbers in CSS Tokenizer
The spec wants us to produce numeric values as the Tokenizer sees them,
rather than waiting until the parse stage. This is a first step towards
that.
2021-11-19 22:35:05 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Sam Atkins
ecf5368535 LibWeb: Record position information in CSS Tokens
This is a requirement to be able to use the Tokens for syntax
highlighting.
2021-10-23 19:07:44 +02:00
Sam Atkins
9a2eecaca4 LibWeb: Add CSS Tokenizer::consume_as_much_whitespace_as_possible()
This is a step in the spec in 3 places, and we had it implemented
differently in each one. This unifies them and makes it clearer what
we're doing.
2021-10-23 19:07:44 +02:00
Sam Atkins
dfbdc20f87 LibWeb: Add spec links to CSS Tokenizer
Also renamed `starts_with_a_number()` -> `would_start_a_number()` to
better match spec terminology.
2021-10-23 19:07:44 +02:00
Sam Atkins
c249fbd17c LibWeb: Correct escape handling in CSS Tokenizer
Calling is_valid_escape_sequence() with no arguments hides what it
is operating on, so I have removed that, so that you must explicitly
tell it what you are testing.

The call from consume_a_token() was using the wrong tokens, so it
returned false incorrectly. This was resulting in corrupted output
when faced with this code from Acid2. (Abbreviated)

```css
.parser { error: \}; }
.parser { }
```
2021-07-11 23:19:56 +02:00
Sam Atkins
b7116711bf LibWeb: Add TokenStream class to CSS Parser
The entry points for CSS parsing in the spec are defined as accepting
any of a stream of Tokens, or a stream of ComponentValues, or a String.
TokenStream is an attempt to reduce the duplication of code for that.
2021-07-11 23:19:56 +02:00
Sam Atkins
9c14504bbb LibWeb: Rename CSS::Token::TokenType -> Type 2021-07-11 23:19:56 +02:00
Sam Atkins
985ed47a38 LibWeb: Use EOF code point instead of Optional in CSS Tokenizer
Optional seems like a good idea, but in many places we were not
checking if it had a value, which was causing crashes when the
Tokenizer was given malformed input. Using an EOF value along with
is_eof() makes things a lot simpler.
2021-07-11 23:19:56 +02:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Andreas Kling
407d6cd9e4 AK: Rename Utf8CodepointIterator => Utf8CodePointIterator 2021-06-01 09:45:52 +02:00
Linus Groh
649d2faeab Everywhere: Use "the SerenityOS developers." in copyright headers
We had some inconsistencies before:

- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."

I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.

By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
2021-04-29 00:59:26 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
078f0a5c67 LibWeb: Add specification-based CSS tokenizer
Original work by @stelar7 for #2628.
2021-03-09 17:35:38 +01:00