Which pretty much needs to be done together due to the amount of places
where they are compared together.
This also involves porting over StackOfOpenElements over to FlyString
from DeprecatedFly string to prevent a gazillion calls to
`.to_deprecated_fly_string` calls in HTMLParser.
This moves some stuff around to make LibGUI depend on LibSyntax instead
of the other way around, as not every application that wishes to do
syntax highlighting is necessarily a LibGUI (or even a GUI) application.
To illustrate the previous behavior, consider these tags and their start
and end positions (shown inclusively below):
Start tag: End tag:
<span> </span>
^ start ^ start
^end ^end
The start position of a tag is the first ASCII-alpha code point after
the opening brace. The start position of a close tag is the slash just
before the first ASCII-alpha code point. And the end position of both
is the closing brace. So the opening brace is not included in the
emitted tag, but the closing brace is. And the end tag including the
slash is an oddity that had to be worked around in its only use case
(syntax highlighting).
We now consistently exclude the braces from the emitted tag, and also
exclude the slash from the end tag, so that it does not need to be
accounted for in syntax highlighting. That is, we now have:
Start tag: End tag:
<span> </span>
^ start ^ start
^end ^end
The tokenizer unit test has been extended to test these positions.
This adds the regions generated from embedded CSS and JS, and also for
HTML block comments.
The glaring omission is that we don't add them for start/end tags. HTML
allows start and end tags to not always match up, and I believe that's
going to require some variation on the adoption-agency algorithm to
make it work correctly.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
And use them to highlight javascript in HTML source.
This commit also changes how TextDocumentSpan::data is interpreted,
as it used to be an opaque pointer, but everyone stuffed an enum value
inside it, which made the values not unique to each highlighter;
that field is now a u64 serial id.
The syntax highlighters don't need to change their ways of stuffing
token types into that field, but a highlighter that calls another
nested highlighter needs to register the nested types for use with
token pairs.
This changes the HTML SyntaxHighlighter to conform to the now-fixed
rendering of syntax highlighting spans in GUI::TextEditor. It also
avoids emitting tokens if they have a zero or negative length.
This fixes a bug where single-character tokens were not highlighted
properly.