Commit graph

18 commits

Author SHA1 Message Date
Sam Atkins
dfbdc20f87 LibWeb: Add spec links to CSS Tokenizer
Also renamed `starts_with_a_number()` -> `would_start_a_number()` to
better match spec terminology.
2021-10-23 19:07:44 +02:00
Sam Atkins
bb1cc99750 LibWeb: Stop treating EOF as a valid part of an identifier
This was specifically causing the string "0" to be parsed as an invalid
Dimension token with no units, instead of as a Number. That then caused
out generated `property_initial_value()` function to fail for those
values.
2021-09-17 23:06:45 +02:00
sin-ack
d9900ece2f LibWeb: Preprocess the CSS stream in the Tokenizer
This commit implements the input preprocessing algorithm that CSS Syntax
Module Level 3 defines.
2021-08-30 00:08:40 +02:00
Sam Atkins
74c9587798 LibWeb: Fix EOF handling in CSS Tokenizer peek_{twin,triplet}()
Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
2021-08-04 19:04:12 +04:30
Sam Atkins
e54531244f LibWeb: Define proper debug symbols for CSS Parser and Tokenizer
You can now turn debug logging for them on using `CSS_PARSER_DEBUG` and
`CSS_TOKENIZER_DEBUG`.
2021-07-31 00:18:11 +02:00
Sam Atkins
7439fbd896 LibWeb: Get CSS @import rules working in new parser
Also added css-import.html, which tests the 3 syntax variations on
`@import` statements. Note that the optional media-query parameter to
`@import` is not handled yet.
2021-07-31 00:18:11 +02:00
Sam Atkins
c249fbd17c LibWeb: Correct escape handling in CSS Tokenizer
Calling is_valid_escape_sequence() with no arguments hides what it
is operating on, so I have removed that, so that you must explicitly
tell it what you are testing.

The call from consume_a_token() was using the wrong tokens, so it
returned false incorrectly. This was resulting in corrupted output
when faced with this code from Acid2. (Abbreviated)

```css
.parser { error: \}; }
.parser { }
```
2021-07-11 23:19:56 +02:00
Sam Atkins
b7116711bf LibWeb: Add TokenStream class to CSS Parser
The entry points for CSS parsing in the spec are defined as accepting
any of a stream of Tokens, or a stream of ComponentValues, or a String.
TokenStream is an attempt to reduce the duplication of code for that.
2021-07-11 23:19:56 +02:00
Sam Atkins
6c03123b2d LibWeb: Give CSS Token and StyleComponentValueRule matching is() funcs
The end goal here is to make the two classes mostly interchangeable, as
the CSS spec requires that the various parser algorithms can take a
stream of either class, and we want to have that functionality without
needing to duplicate all of the code.
2021-07-11 23:19:56 +02:00
Sam Atkins
9c14504bbb LibWeb: Rename CSS::Token::TokenType -> Type 2021-07-11 23:19:56 +02:00
Sam Atkins
985ed47a38 LibWeb: Use EOF code point instead of Optional in CSS Tokenizer
Optional seems like a good idea, but in many places we were not
checking if it had a value, which was causing crashes when the
Tokenizer was given malformed input. Using an EOF value along with
is_eof() makes things a lot simpler.
2021-07-11 23:19:56 +02:00
Sam Atkins
9115c23bd5 LibWeb: Fix greedy CSS Tokenizer whitespace parsing
Whitespace parsing was too greedy, consuming the first non-
whitespace character after it.
2021-07-11 23:19:56 +02:00
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Linus Groh
649d2faeab Everywhere: Use "the SerenityOS developers." in copyright headers
We had some inconsistencies before:

- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."

I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.

By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
2021-04-29 00:59:26 +02:00
Brian Gianforcaro
1411ae1bc7 LibWeb: Utilize SourceLocation for CSS/Tokenizer logging 2021-04-25 09:32:03 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
078f0a5c67 LibWeb: Add specification-based CSS tokenizer
Original work by @stelar7 for #2628.
2021-03-09 17:35:38 +01:00