Given a selector like `.foo .bar #baz`, we know that elements with
the class names `foo` and `bar` must be present in the ancestor chain of
the candidate element, or the selector cannot match.
By keeping track of the current ancestor chain during style computation,
and which strings are used in tag names and attribute names, we can do
a quick check before evaluating the selector itself, to see if all the
required ancestors are present.
The way this works:
1. CSS::Selector now has a cache of up to 8 strings that must be present
in the ancestor chain of a matching element. Note that we actually
store string *hashes*, not the strings themselves.
2. When Document performs a recursive style update, we now push and pop
elements to the ancestor chain stack as they are entered and exited.
3. When entering/exiting an ancestor, StyleComputer collects all the
relevant string hashes from that ancestor element and updates a
counting bloom filter.
4. Before evaluating a selector, we first check if any of the hashes
required by the selector are definitely missing from the ancestor
filter. If so, it cannot be a match, and we reject it immediately.
5. Otherwise, we carry on and evaluate the selector as usual.
I originally tried doing this with a HashMap, but we ended up losing
a huge chunk of the time saved to HashMap instead. As it turns out,
a simple counting bloom filter is way better at handling this.
The cost is a flat 8KB per StyleComputer, and since it's a bloom filter,
false positives are a thing.
This is extremely efficient, and allows us to quickly reject the
majority of selectors on many huge websites.
Some example rejection rates:
- https://amazon.com: 77%
- https://github.com/SerenityOS/serenity: 61%
- https://nytimes.com: 57%
- https://store.steampowered.com: 55%
- https://en.wikipedia.org: 45%
- https://youtube.com: 32%
- https://shopify.com: 25%
This also yields a chunky 37% speedup on StyleBench. :^)
As outlined in: https://www.w3.org/TR/selectors-4/#compat
We now do not treat unknown webkit pseudo-elements as invalid at parse
time, and also support serializing these elements.
Fixes: #21959
No functional impact intended. This is just a more complicated way of
writing what we have now.
The goal of this commit is so that we are able to store the 'name' of a
pseudo element for use in serializing 'unknown -webkit-
pseudo-elements', see:
https://www.w3.org/TR/selectors-4/#compat
This is quite awkward, as in pretty much all cases just the selector
type enum is enough, but we will need to cache the name for serializing
these unknown selectors. I can't figure out any reason why we would need
this name anywhere else in the engine, so pretty much everywhere is
still just passing around this raw enum. But this change will allow us
to easily store the name inside of this new struct for when it is needed
for serialization, once those webkit unknown elements are supported by
our engine.
We don't yet set the Document's target element in most cases, so this
does not function very well. But that will improve once we *do* set it,
which involves a more complete Navigables implementation.
For now, we parse these, but don't actually consider the namespace when
matching them. `DOM::Element` does not (yet) store attribute namespaces
so we can't check what they are.
This is basically a name with a namespace prefix. It will be used for
adding namespaces to Universal, TagName, and Attribute selectors.
For convenience, this can also optionally parse/store the `*` wildcard
character as the name.
This class had slightly confusing semantics and the added weirdness
doesn't seem worth it just so we can say "." instead of "->" when
iterating over a vector of NNRPs.
This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
The ::placeholder pseudo element was added in commit 1fbad9c, but the
total number of pseudo elements was not updated. Instead of this manual
bookkeeping, add a dummy value at the end of the enumeration for the
count.
When matching selectors in HTML documents, we know that all the elements
have lowercase local names already (the parser makes sure of this.)
Style sheets still need to remember the original name strings, in case
we want to match against non-HTML content like XML/SVG. To make the
common HTML case faster, we now cache a lowercase version of the name
with each type/class/id SimpleSelector.
This makes tag type checks O(1) instead of O(n).
These will be needed for styling progress bars, sadly this can
only be done with these non-standard selectors. These are, hovever,
in use on real sites such as https://rpcs3.net/compatibility.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Now that we use a Variant for the SimpleSelector's data, we don't need
to instantiate empty structs or variables for the types that aren't
used, and so we can remove `PseudoElement::None`,
`PsuedoClass::Type::None` and `Attribute::MatchType::None`.
Also, we now always initialize a SimpleSelector with a type, so
`SimpleSelector::Type::Invalid` can go too.
There was no real benefit to creating the SimpleSelector early and then
modifying it, and doing so made this code harder to follow than it
needs to be.
This doesn't have parsing support for multiple languages in the same
selector. Support for language subcodes is not great either. But it
does do the basics.
Computing the pseudo element of a CSS::Selector was very hot when
mousing around on GitHub in Browser. A profile had it at ~10%.
After these changes, it's totally gone from the profile. :^)