Commit graph

27 commits

Author SHA1 Message Date
Sam Atkins
bb82ee5530 LibWeb: Pass correct values to would_start_a_number()
This fixes the crash that Luke found using Domato:
```css
. foo {
    mso-border-alt: solid  .-1pt;
}
```

The spec distinguishes between "If the next 3 code points would
start..." and "If the input stream starts with..." but we were treating
them the same way, skipping the first code point in the process.
2021-12-27 22:56:08 +01:00
Sam Atkins
981badb45f LibWeb: Add CSS::Tokenizer::start_of_input_stream_[twin|triplet]()
These correspond to "If the input stream starts with..." in the spec,
which up until now we were not handling correctly, which led to some fun
bugs.

As noted, reconsuming the input code point in order to read its value is
hacky, but works. Keeping track of the current code point in Tokenizer
would be nicer, when I'm feeling brave enough to mess with it!
2021-12-27 22:56:08 +01:00
Sam Atkins
85e5586a27 LibWeb: Add spec comments to CSS Tokenizer
Some of the code has been slightly rearranged to match the spec order,
but otherwise I've tried not to mess with it.
2021-11-19 22:35:05 +01:00
Sam Atkins
9403cc42f9 LibWeb: Convert CSS Token::m_value from StringBuilder to FlyString
Again, this value does not change once we have finished creating the
Token, so it can be more lightweight.
2021-11-19 22:35:05 +01:00
Sam Atkins
75e7c2c5c0 LibWeb: Convert CSS Token::m_unit from StringBuilder to FlyString
This value doesn't change once it's assigned to the Token, so it can be
more lightweight than a StringBuilder.
2021-11-19 22:35:05 +01:00
Sam Atkins
f6869797a7 LibWeb: Convert numeric tokens to numbers in CSS Tokenizer
The spec wants us to produce numeric values as the Tokenizer sees them,
rather than waiting until the parse stage. This is a first step towards
that.
2021-11-19 22:35:05 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Sam Atkins
ecf5368535 LibWeb: Record position information in CSS Tokens
This is a requirement to be able to use the Tokens for syntax
highlighting.
2021-10-23 19:07:44 +02:00
Sam Atkins
9a2eecaca4 LibWeb: Add CSS Tokenizer::consume_as_much_whitespace_as_possible()
This is a step in the spec in 3 places, and we had it implemented
differently in each one. This unifies them and makes it clearer what
we're doing.
2021-10-23 19:07:44 +02:00
Sam Atkins
dfbdc20f87 LibWeb: Add spec links to CSS Tokenizer
Also renamed `starts_with_a_number()` -> `would_start_a_number()` to
better match spec terminology.
2021-10-23 19:07:44 +02:00
Sam Atkins
bb1cc99750 LibWeb: Stop treating EOF as a valid part of an identifier
This was specifically causing the string "0" to be parsed as an invalid
Dimension token with no units, instead of as a Number. That then caused
out generated `property_initial_value()` function to fail for those
values.
2021-09-17 23:06:45 +02:00
sin-ack
d9900ece2f LibWeb: Preprocess the CSS stream in the Tokenizer
This commit implements the input preprocessing algorithm that CSS Syntax
Module Level 3 defines.
2021-08-30 00:08:40 +02:00
Sam Atkins
74c9587798 LibWeb: Fix EOF handling in CSS Tokenizer peek_{twin,triplet}()
Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
2021-08-04 19:04:12 +04:30
Sam Atkins
e54531244f LibWeb: Define proper debug symbols for CSS Parser and Tokenizer
You can now turn debug logging for them on using `CSS_PARSER_DEBUG` and
`CSS_TOKENIZER_DEBUG`.
2021-07-31 00:18:11 +02:00
Sam Atkins
7439fbd896 LibWeb: Get CSS @import rules working in new parser
Also added css-import.html, which tests the 3 syntax variations on
`@import` statements. Note that the optional media-query parameter to
`@import` is not handled yet.
2021-07-31 00:18:11 +02:00
Sam Atkins
c249fbd17c LibWeb: Correct escape handling in CSS Tokenizer
Calling is_valid_escape_sequence() with no arguments hides what it
is operating on, so I have removed that, so that you must explicitly
tell it what you are testing.

The call from consume_a_token() was using the wrong tokens, so it
returned false incorrectly. This was resulting in corrupted output
when faced with this code from Acid2. (Abbreviated)

```css
.parser { error: \}; }
.parser { }
```
2021-07-11 23:19:56 +02:00
Sam Atkins
b7116711bf LibWeb: Add TokenStream class to CSS Parser
The entry points for CSS parsing in the spec are defined as accepting
any of a stream of Tokens, or a stream of ComponentValues, or a String.
TokenStream is an attempt to reduce the duplication of code for that.
2021-07-11 23:19:56 +02:00
Sam Atkins
6c03123b2d LibWeb: Give CSS Token and StyleComponentValueRule matching is() funcs
The end goal here is to make the two classes mostly interchangeable, as
the CSS spec requires that the various parser algorithms can take a
stream of either class, and we want to have that functionality without
needing to duplicate all of the code.
2021-07-11 23:19:56 +02:00
Sam Atkins
9c14504bbb LibWeb: Rename CSS::Token::TokenType -> Type 2021-07-11 23:19:56 +02:00
Sam Atkins
985ed47a38 LibWeb: Use EOF code point instead of Optional in CSS Tokenizer
Optional seems like a good idea, but in many places we were not
checking if it had a value, which was causing crashes when the
Tokenizer was given malformed input. Using an EOF value along with
is_eof() makes things a lot simpler.
2021-07-11 23:19:56 +02:00
Sam Atkins
9115c23bd5 LibWeb: Fix greedy CSS Tokenizer whitespace parsing
Whitespace parsing was too greedy, consuming the first non-
whitespace character after it.
2021-07-11 23:19:56 +02:00
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Linus Groh
649d2faeab Everywhere: Use "the SerenityOS developers." in copyright headers
We had some inconsistencies before:

- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."

I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.

By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
2021-04-29 00:59:26 +02:00
Brian Gianforcaro
1411ae1bc7 LibWeb: Utilize SourceLocation for CSS/Tokenizer logging 2021-04-25 09:32:03 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
078f0a5c67 LibWeb: Add specification-based CSS tokenizer
Original work by @stelar7 for #2628.
2021-03-09 17:35:38 +01:00