This class was being copied all over the place, however, most of these
cases can be easily prevented with `auto const&` or `NonnullRawPtr<>`.
It also didn't have a move constructor, causing `Vector` to copy on
every resize as well.
Removing all these copies results in an almost 15% increase in
performance for CSS parsing, as measured with callgrind.
CSS Syntax 3 (https://drafts.csswg.org/css-syntax) has changed
significantly since we implemented it a couple of years ago. Just about
every parsing algorithm has been rewritten in terms of the new token
stream concept, and to support nested styles. As all of those
algorithms call into each other, this is an unfortunately chonky diff.
As part of this, the transitory types (Declaration, Function, AtRule...)
have been rewritten. That's both because we have new requirements of
what they should be and contain, and also because the spec asks us to
create and then gradually modify them in place, which is easier if they
are plain structs.
When the TokenStream code was originally written, there was no such
concept in the CSS Syntax spec. But since then, it's been officially
added, (https://drafts.csswg.org/css-syntax/#css-token-stream) and the
parsing algorithms are described in terms of it. This patch brings our
implementation in line with the spec. A few deprecated TokenStream
methods are left around until their users are also updated to match the
newer spec.
There are a few differences:
- They name things differently. The main confusing one is we had
`next_token()` which consumed a token and returned it, but the spec
has a `next_token()` which peeks the next token. The spec names are
honestly better than what I'd come up with. (`discard_a_token()` is a
nice addition too!)
- We used to store the index of the token that was just consumed, and
they instead store the index of the token that will be consumed next.
This is a perfect breeding ground for off-by-one errors, so I've
finally added a test suite for TokenStream itself.
- We use a transaction system for rewinding, and the spec uses a stack
of "marks", which can be manually rewound to. These should be able to
coexist as long as we stick with marks in the parser spec algorithms,
and stick with transactions elsewhere.
For a long time, we've used two terms, inconsistently:
- "Identifier" is a spec term, but refers to a sequence of alphanumeric
characters, which may or may not be a keyword. (Keywords are a
subset of all identifiers.)
- "ValueID" is entirely non-spec, and is directly called a "keyword" in
the CSS specs.
So to avoid confusion as much as possible, let's align with the spec
terminology. I've attempted to change variable names as well, but
obviously we use Keywords in a lot of places in LibWeb and so I may
have missed some.
One exception is that I've not renamed "valid-identifiers" in
Properties.json... I'd like to combine that and the "valid-types" array
together eventually, so there's no benefit to doing an extra rename
now.
Instead of a StringView. This allows us to preserve the nice O(1) string
compare property of FlyString, and not needing to allocate when one is
needed.
Ideally all other places in Token should have similar changes done, but
to prevent a huge amount of churn, just change ident for now.