0ct0pu5/ladybird

Author	SHA1	Message	Date
Sam Atkins	89c5f25016	LibWeb/CSS: Remove tiny-oom propagation from CSS Tokenizer	2024-07-26 17:29:20 +02:00
Sam Atkins	1a5533e528	LibWeb: Tokenize CSS numbers as doubles Every later stage uses doubles, so dropping that precision right at the start of parsing is a little silly. :^)	2023-08-20 14:25:18 +01:00
Sam Atkins	c138845013	LibWeb: Store the original representation of CSS tokens This is required for the `<urange>` type, and custom properties, to work correctly, as both need to know exactly what the original text was.	2023-03-22 19:45:40 +01:00
Sam Atkins	84af8dd9ed	LibWeb: Propagate errors from CSS Tokenizer	2023-03-07 00:43:36 +01:00
Sam Atkins	17618989a3	LibWeb: Propagate errors from CSS Tokenizer construction Instead of constructing a Tokenizer and then calling parse() on it, we now call `Tokenizer::tokenize(...)` directly. (Renamed from `parse()` because this is a Tokenizer, not a Parser.)	2023-03-07 00:43:36 +01:00
Sam Atkins	3685a8813a	LibWeb: Port CSS Tokenizer to new Strings Specifically, this uses FlyString, because the data gets held long-term as a FlyString anyway.	2023-02-15 12:48:26 -05:00
Sam Atkins	8af65108e4	LibWeb: Construct CSS Tokenizer and Parser with a StringView encoding This doesn't need to be a full (Deprecated)String, so let's not force it to be.	2023-02-15 12:48:26 -05:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Sam Atkins	97e174afcd	LibWeb: Use the term "ident sequence" instead of "name" This is an editorial change in the December 2021 version of SYNTAX-3: https://www.w3.org/TR/2021/CRD-css-syntax-3-20211224/	2022-10-03 17:09:41 +01:00
Sam Atkins	bf786d66b1	LibWeb: Move Token and Tokenizer into Parser namespace	2022-04-12 23:03:46 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Sam Atkins	fe372cd073	LibWeb: Use CSS::Number for Token numeric values	2022-03-22 15:47:36 +01:00
Sam Atkins	0795b9f7bb	LibWeb: Use floats instead of doubles for CSS numbers Using doubles isn't necessary, and they make things slightly bigger and slower, so let's use floats instead.	2022-03-22 15:47:36 +01:00
Sam Atkins	269a24d4ca	LibWeb: Pass correct values to would_start_an_identifier() Same as with would_start_a_number(), we were skipping a code point.	2021-12-27 22:56:08 +01:00
Sam Atkins	bb82ee5530	LibWeb: Pass correct values to would_start_a_number() This fixes the crash that Luke found using Domato: ```css . foo { mso-border-alt: solid .-1pt; } ``` The spec distinguishes between "If the next 3 code points would start..." and "If the input stream starts with..." but we were treating them the same way, skipping the first code point in the process.	2021-12-27 22:56:08 +01:00
Sam Atkins	981badb45f	LibWeb: Add CSS::Tokenizer::start_of_input_stream_[twin\|triplet]() These correspond to "If the input stream starts with..." in the spec, which up until now we were not handling correctly, which led to some fun bugs. As noted, reconsuming the input code point in order to read its value is hacky, but works. Keeping track of the current code point in Tokenizer would be nicer, when I'm feeling brave enough to mess with it!	2021-12-27 22:56:08 +01:00
Sam Atkins	f6869797a7	LibWeb: Convert numeric tokens to numbers in CSS Tokenizer The spec wants us to produce numeric values as the Tokenizer sees them, rather than waiting until the parse stage. This is a first step towards that.	2021-11-19 22:35:05 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Sam Atkins	ecf5368535	LibWeb: Record position information in CSS Tokens This is a requirement to be able to use the Tokens for syntax highlighting.	2021-10-23 19:07:44 +02:00
Sam Atkins	9a2eecaca4	LibWeb: Add CSS Tokenizer::consume_as_much_whitespace_as_possible() This is a step in the spec in 3 places, and we had it implemented differently in each one. This unifies them and makes it clearer what we're doing.	2021-10-23 19:07:44 +02:00
Sam Atkins	dfbdc20f87	LibWeb: Add spec links to CSS Tokenizer Also renamed `starts_with_a_number()` -> `would_start_a_number()` to better match spec terminology.	2021-10-23 19:07:44 +02:00
Sam Atkins	c249fbd17c	LibWeb: Correct escape handling in CSS Tokenizer Calling is_valid_escape_sequence() with no arguments hides what it is operating on, so I have removed that, so that you must explicitly tell it what you are testing. The call from consume_a_token() was using the wrong tokens, so it returned false incorrectly. This was resulting in corrupted output when faced with this code from Acid2. (Abbreviated) ```css .parser { error: \}; } .parser { } ```	2021-07-11 23:19:56 +02:00
Sam Atkins	b7116711bf	LibWeb: Add TokenStream class to CSS Parser The entry points for CSS parsing in the spec are defined as accepting any of a stream of Tokens, or a stream of ComponentValues, or a String. TokenStream is an attempt to reduce the duplication of code for that.	2021-07-11 23:19:56 +02:00
Sam Atkins	9c14504bbb	LibWeb: Rename CSS::Token::TokenType -> Type	2021-07-11 23:19:56 +02:00
Sam Atkins	985ed47a38	LibWeb: Use EOF code point instead of Optional in CSS Tokenizer Optional seems like a good idea, but in many places we were not checking if it had a value, which was causing crashes when the Tokenizer was given malformed input. Using an EOF value along with is_eof() makes things a lot simpler.	2021-07-11 23:19:56 +02:00
Andreas Kling	12a42edd13	Everywhere: codepoint => code point	2021-06-01 10:01:11 +02:00
Andreas Kling	407d6cd9e4	AK: Rename Utf8CodepointIterator => Utf8CodePointIterator	2021-06-01 09:45:52 +02:00
Linus Groh	649d2faeab	Everywhere: Use "the SerenityOS developers." in copyright headers We had some inconsistencies before: - Sometimes "The", sometimes "the" - Sometimes trailing ".", sometimes no trailing "." I picked the most common one (lowecase "the", trailing ".") and applied it to all copyright headers. By using the exact same string everywhere we can ensure nothing gets missed during a global search (and replace), and that these inconsistencies are not spread any further (as copyright headers are commonly copied to new files).	2021-04-29 00:59:26 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Andreas Kling	078f0a5c67	LibWeb: Add specification-based CSS tokenizer Original work by @stelar7 for #2628.	2021-03-09 17:35:38 +01:00

30 commits