0ct0pu5/ladybird

Author	SHA1	Message	Date
Sam Atkins	bb82ee5530	LibWeb: Pass correct values to would_start_a_number() This fixes the crash that Luke found using Domato: ```css . foo { mso-border-alt: solid .-1pt; } ``` The spec distinguishes between "If the next 3 code points would start..." and "If the input stream starts with..." but we were treating them the same way, skipping the first code point in the process.	2021-12-27 22:56:08 +01:00
Sam Atkins	981badb45f	LibWeb: Add CSS::Tokenizer::start_of_input_stream_[twin\|triplet]() These correspond to "If the input stream starts with..." in the spec, which up until now we were not handling correctly, which led to some fun bugs. As noted, reconsuming the input code point in order to read its value is hacky, but works. Keeping track of the current code point in Tokenizer would be nicer, when I'm feeling brave enough to mess with it!	2021-12-27 22:56:08 +01:00
Sam Atkins	85e5586a27	LibWeb: Add spec comments to CSS Tokenizer Some of the code has been slightly rearranged to match the spec order, but otherwise I've tried not to mess with it.	2021-11-19 22:35:05 +01:00
Sam Atkins	9403cc42f9	LibWeb: Convert CSS Token::m_value from StringBuilder to FlyString Again, this value does not change once we have finished creating the Token, so it can be more lightweight.	2021-11-19 22:35:05 +01:00
Sam Atkins	75e7c2c5c0	LibWeb: Convert CSS Token::m_unit from StringBuilder to FlyString This value doesn't change once it's assigned to the Token, so it can be more lightweight than a StringBuilder.	2021-11-19 22:35:05 +01:00
Sam Atkins	f6869797a7	LibWeb: Convert numeric tokens to numbers in CSS Tokenizer The spec wants us to produce numeric values as the Tokenizer sees them, rather than waiting until the parse stage. This is a first step towards that.	2021-11-19 22:35:05 +01:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Sam Atkins	ecf5368535	LibWeb: Record position information in CSS Tokens This is a requirement to be able to use the Tokens for syntax highlighting.	2021-10-23 19:07:44 +02:00
Sam Atkins	9a2eecaca4	LibWeb: Add CSS Tokenizer::consume_as_much_whitespace_as_possible() This is a step in the spec in 3 places, and we had it implemented differently in each one. This unifies them and makes it clearer what we're doing.	2021-10-23 19:07:44 +02:00
Sam Atkins	dfbdc20f87	LibWeb: Add spec links to CSS Tokenizer Also renamed `starts_with_a_number()` -> `would_start_a_number()` to better match spec terminology.	2021-10-23 19:07:44 +02:00
Sam Atkins	bb1cc99750	LibWeb: Stop treating EOF as a valid part of an identifier This was specifically causing the string "0" to be parsed as an invalid Dimension token with no units, instead of as a Number. That then caused out generated `property_initial_value()` function to fail for those values.	2021-09-17 23:06:45 +02:00
sin-ack	d9900ece2f	LibWeb: Preprocess the CSS stream in the Tokenizer This commit implements the input preprocessing algorithm that CSS Syntax Module Level 3 defines.	2021-08-30 00:08:40 +02:00
Sam Atkins	74c9587798	LibWeb: Fix EOF handling in CSS Tokenizer peek_{twin,triplet}() Previously, the loops would stop before reaching EOF, meaning that the values that should have been set to EOF were left with their 0 initial values. Now, we initialize to EOFs instead. The if/else inside the loops always ran the else branch so I have removed the if branches.	2021-08-04 19:04:12 +04:30
Sam Atkins	e54531244f	LibWeb: Define proper debug symbols for CSS Parser and Tokenizer You can now turn debug logging for them on using `CSS_PARSER_DEBUG` and `CSS_TOKENIZER_DEBUG`.	2021-07-31 00:18:11 +02:00
Sam Atkins	7439fbd896	LibWeb: Get CSS @import rules working in new parser Also added css-import.html, which tests the 3 syntax variations on `@import` statements. Note that the optional media-query parameter to `@import` is not handled yet.	2021-07-31 00:18:11 +02:00
Sam Atkins	c249fbd17c	LibWeb: Correct escape handling in CSS Tokenizer Calling is_valid_escape_sequence() with no arguments hides what it is operating on, so I have removed that, so that you must explicitly tell it what you are testing. The call from consume_a_token() was using the wrong tokens, so it returned false incorrectly. This was resulting in corrupted output when faced with this code from Acid2. (Abbreviated) ```css .parser { error: \}; } .parser { } ```	2021-07-11 23:19:56 +02:00
Sam Atkins	b7116711bf	LibWeb: Add TokenStream class to CSS Parser The entry points for CSS parsing in the spec are defined as accepting any of a stream of Tokens, or a stream of ComponentValues, or a String. TokenStream is an attempt to reduce the duplication of code for that.	2021-07-11 23:19:56 +02:00
Sam Atkins	6c03123b2d	LibWeb: Give CSS Token and StyleComponentValueRule matching is() funcs The end goal here is to make the two classes mostly interchangeable, as the CSS spec requires that the various parser algorithms can take a stream of either class, and we want to have that functionality without needing to duplicate all of the code.	2021-07-11 23:19:56 +02:00
Sam Atkins	9c14504bbb	LibWeb: Rename CSS::Token::TokenType -> Type	2021-07-11 23:19:56 +02:00
Sam Atkins	985ed47a38	LibWeb: Use EOF code point instead of Optional in CSS Tokenizer Optional seems like a good idea, but in many places we were not checking if it had a value, which was causing crashes when the Tokenizer was given malformed input. Using an EOF value along with is_eof() makes things a lot simpler.	2021-07-11 23:19:56 +02:00
Sam Atkins	9115c23bd5	LibWeb: Fix greedy CSS Tokenizer whitespace parsing Whitespace parsing was too greedy, consuming the first non- whitespace character after it.	2021-07-11 23:19:56 +02:00
Max Wipfli	bc8d16ad28	Everywhere: Replace ctype.h to avoid narrowing conversions This replaces ctype.h with CharacterType.h everywhere I could find issues with narrowing conversions. While using it will probably make sense almost everywhere in the future, the most critical places should have been addressed.	2021-06-03 13:31:46 +02:00
Andreas Kling	12a42edd13	Everywhere: codepoint => code point	2021-06-01 10:01:11 +02:00
Linus Groh	649d2faeab	Everywhere: Use "the SerenityOS developers." in copyright headers We had some inconsistencies before: - Sometimes "The", sometimes "the" - Sometimes trailing ".", sometimes no trailing "." I picked the most common one (lowecase "the", trailing ".") and applied it to all copyright headers. By using the exact same string everywhere we can ensure nothing gets missed during a global search (and replace), and that these inconsistencies are not spread any further (as copyright headers are commonly copied to new files).	2021-04-29 00:59:26 +02:00
Brian Gianforcaro	1411ae1bc7	LibWeb: Utilize SourceLocation for CSS/Tokenizer logging	2021-04-25 09:32:03 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Andreas Kling	078f0a5c67	LibWeb: Add specification-based CSS tokenizer Original work by @stelar7 for #2628.	2021-03-09 17:35:38 +01:00

27 commits