Commit graph

30 commits

Author SHA1 Message Date
Max Wipfli
1aeafcc58b LibWeb: Use getter and setter for Character type HTMLTokens
While storing the code point in a UTF-8 encoded String in horrendously
inefficient, this problem will be addressed at a later stage.
2021-07-17 16:24:57 +04:30
Max Wipfli
e8e9426b4f LibWeb: User getter and setter for Comment type HTMLTokens 2021-07-17 16:24:57 +04:30
Max Wipfli
f886aa15b8 LibWeb: Rename HTMLToken::AttributeBuilder struct to Attribute
This does not contain StringBuilders anymore, so it can do with a
simpler name: Attribute.
2021-07-17 16:24:57 +04:30
Max Wipfli
e22a34badb LibWeb: Fix assertion failures in HTMLTokenizer
The *TagName states are all very similar, so it seems to be correct to
apply the fix from #8761 to all of those states.

This fixes #8788.
2021-07-16 11:55:55 +02:00
Max Wipfli
2404ad6897 LibWeb: Fix assertion failure when tokenizing JS regex literals
This fixes parsing the following regular expression: /</g;

It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
2021-07-15 01:47:22 +02:00
Max Wipfli
bb2aed7d76 LibWeb: Correct behavior of Comment* states in HTMLTokenizer
Previously, this would lead to assertion failures when parsing HTML
comments. This fixes #8757.
2021-07-15 00:48:45 +02:00
Max Wipfli
af0b483123 LibWeb: VERIFY an empty builder when emitting tokens in HTMLTokenizer 2021-07-15 00:48:45 +02:00
Max Wipfli
125982943a LibWeb: Change HTMLTokenizer.{cpp,h} to east const style 2021-07-14 23:03:36 +02:00
Gunnar Beutner
300823c314 LibWeb: Use move() when enqueuing tokens in HTMLTokenizer
We're not using the current token anymore once it's enqueued so let's
use move() when enqueuing the tokens.
2021-07-14 23:03:36 +02:00
Gunnar Beutner
c3ad8e9a52 LibWeb: Remove StringBuilder from HTMLToken::m_comment_or_character 2021-07-14 23:03:36 +02:00
Gunnar Beutner
3aa202c432 LibWeb: Remove StringBuilder from HTMLToken::m_tag 2021-07-14 23:03:36 +02:00
Gunnar Beutner
901d71148b LibWeb: Remove StringBuilders from HTMLToken::AttributeBuilder 2021-07-14 23:03:36 +02:00
Gunnar Beutner
992964aa7d LibWeb: Remove StringBuilders from HTMLToken::m_doctype 2021-07-14 23:03:36 +02:00
Gunnar Beutner
d9e52997e2 LibWeb: Use an Optional<String> to track the last HTML start tag
Using an HTMLToken object here is unnecessary because the only
attribute we're interested in is the tag_name.
2021-07-14 23:03:36 +02:00
Andreas Kling
dc65f54c06 AK: Rename Vector::append(Vector) => Vector::extend(Vector)
Let's make it a bit more clear when we're appending the elements from
one vector to the end of another vector.
2021-06-12 13:24:45 +02:00
Max Wipfli
282a623853 LibWeb: Change a few source end positions in HTMLTokenizer
This patch aims to fix wrong highlighting for some cases in HTML's
syntax highlighter. The values were somewhat experimentally determined
are are subject to change. Regardless, it should be more correct with
this patch than without it. :^)
2021-06-05 00:32:28 +04:30
Max Wipfli
932161e581 LibWeb: Be more forgiving when adding source positions in HTMLTokenizer
This patch changes HTMLTokenizer::nth_last_position to not fail if the
requested position is not available. Rather, it will just return (0-0).

While this is not the correct solution, it prevents the tokenizer from
crashing just because it cannot find a source position. This should only
affect SyntaxHighlighter.
2021-06-05 00:32:28 +04:30
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Andreas Kling
407d6cd9e4 AK: Rename Utf8CodepointIterator => Utf8CodePointIterator 2021-06-01 09:45:52 +02:00
Ali Mohammad Pur
1822d6b8ac LibWeb: Fix invalid behaviour of HTMLTokenizer::skip() and restore_to()
skip() is supposed to end up keeping the previous iterator only one
index behind the current one, and restore_to() should actually do the
restore instead of just removing the now-useless source positions.
Fixes #7331.
2021-05-21 09:22:35 +02:00
Ali Mohammad Pur
97a230e4ef LibWeb: Add a super basic HTML syntax highlighter
This can currently highlight tag names and attribute names/values.
2021-05-20 22:06:45 +02:00
Ali Mohammad Pur
aa7939bc6c LibWeb: Add position tracking information to HTML tokens 2021-05-20 22:06:45 +02:00
Brian Gianforcaro
6d69c97b99 LibWeb: Utilize SourceLocation for HTMLTokenizer logging 2021-04-25 09:32:03 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
AnotherTest
09a43969ba Everywhere: Replace dbgln<flag>(...) with dbgln_if(flag, ...)
Replacement made by `find Kernel Userland -name '*.h' -o -name '*.cpp' | sed -i -Ee 's/dbgln\b<(\w+)>\(/dbgln_if(\1, /g'`
2021-02-08 18:08:55 +01:00
asynts
8465683dcf Everywhere: Debug macros instead of constexpr.
This was done with the following script:

    find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/dbgln<debug_([a-z_]+)>/dbgln<\U\1_DEBUG>/' {} \;

    find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/if constexpr \(debug_([a-z0-9_]+)/if constexpr \(\U\1_DEBUG/' {} \;
2021-01-25 09:47:36 +01:00
asynts
1a3a0836c0 Everywhere: Use CMake to generate AK/Debug.h.
This was done with the help of several scripts, I dump them here to
easily find them later:

    awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in

    for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
    do
        find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
    done

    # Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
    awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
2021-01-25 09:47:36 +01:00
asynts
7d783d8b84 Everywhere: Replace a bundle of dbg with dbgln.
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
2021-01-22 22:14:30 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Renamed from Libraries/LibWeb/HTML/Parser/HTMLTokenizer.cpp (Browse further)