This fixes the crash that Luke found using Domato:
```css
. foo {
mso-border-alt: solid .-1pt;
}
```
The spec distinguishes between "If the next 3 code points would
start..." and "If the input stream starts with..." but we were treating
them the same way, skipping the first code point in the process.
These correspond to "If the input stream starts with..." in the spec,
which up until now we were not handling correctly, which led to some fun
bugs.
As noted, reconsuming the input code point in order to read its value is
hacky, but works. Keeping track of the current code point in Tokenizer
would be nicer, when I'm feeling brave enough to mess with it!
This was specifically causing the string "0" to be parsed as an invalid
Dimension token with no units, instead of as a Number. That then caused
out generated `property_initial_value()` function to fail for those
values.
Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
Also added css-import.html, which tests the 3 syntax variations on
`@import` statements. Note that the optional media-query parameter to
`@import` is not handled yet.
Calling is_valid_escape_sequence() with no arguments hides what it
is operating on, so I have removed that, so that you must explicitly
tell it what you are testing.
The call from consume_a_token() was using the wrong tokens, so it
returned false incorrectly. This was resulting in corrupted output
when faced with this code from Acid2. (Abbreviated)
```css
.parser { error: \}; }
.parser { }
```
The entry points for CSS parsing in the spec are defined as accepting
any of a stream of Tokens, or a stream of ComponentValues, or a String.
TokenStream is an attempt to reduce the duplication of code for that.
The end goal here is to make the two classes mostly interchangeable, as
the CSS spec requires that the various parser algorithms can take a
stream of either class, and we want to have that functionality without
needing to duplicate all of the code.
Optional seems like a good idea, but in many places we were not
checking if it had a value, which was causing crashes when the
Tokenizer was given malformed input. Using an EOF value along with
is_eof() makes things a lot simpler.
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
We had some inconsistencies before:
- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."
I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.
By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *