Commit graph

32 commits

Author SHA1 Message Date
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00
Timothy Flynn
e3b5e24ce0 AK: Iterate the bytes of a URL query with an unsigned type
Otherwise, we percent-encode negative signed chars incorrectly. For
example, https://www.strava.com/login contains the following hidden
<input> field:

    <input name="utf8" type="hidden" value="✓" />

On submitting the form, we would percent-encode that field as:

    utf8=%-1E%-64%-6D

Which would cause us to receive an HTTP 500 response. We now properly
percent-encode that field as:

    utf8=%E2%9C%93

And can login to Strava :^)
2024-03-10 15:17:31 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Tim Ledbetter
2a1fc96650 AK: Avoid unnecessary String allocations for URL username and password
Previously, `URLParser` was constructing a new String for every
character of the URL's username and password. This change improves
performance by eliminating those unnecessary String allocations.

A URL with a 100,000 character password can now be parsed in ~30ms vs
~8 seconds previously on my machine.
2023-11-06 09:19:12 +01:00
Shannon Booth
3748f1d290 AK: Check for overflow parsing IPv4 number in URL
Fixes OSS fuzz issue:
https://oss-fuzz.com/download?testcase_id=6045676088459264
2023-10-26 11:11:41 +02:00
Shannon Booth
9d60f23abc AK: Port URL::m_fragment from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Shannon Booth
21fe86d235 AK: Port URL::m_query from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Karol Kosek
eb41f0144b AK: Decode data URLs to separate class (and parse like every other URL)
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.

Because parsing 'data:' didn't use standard fields, running the
following JS code:

    new URL('#a', 'data:text/plain,hello').toString()

not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.

With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
2023-08-01 14:19:05 +02:00
Karol Kosek
58017a0581 AK: Clear buffer after leaving CannotBeABaseUrlPath in URLParser
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').

This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.

data: URLs didn't have this problem, because we have a special case for
parsing them.
2023-08-01 10:10:07 +02:00
Shannon Booth
8751be09f9 AK: Serialize URL hosts with 'concept-host-serializer'
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.

Making that change resulted in a whole bunch of fallout.

After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
2023-07-31 05:18:51 +02:00
Shannon Booth
177b04dcfc AK: Fix url host parsing check for 'ends in a number'
I misunderstood the spec step for checking whether the host 'ends with a
number'. We can't simply check for it if ends with a number, this check
is actually an algorithm which is required to avoid detecting hosts that
end with a number from an IPv4 host.

Implement this missing step, and add a test to cover this.
2023-07-25 06:43:50 -04:00
Shannon Booth
8d2ccf0f4f AK: Implement IPV4 host URL parsing to specification
This implements both the parsing and serialization IPV4 parts from
the URL spec.
2023-07-24 17:07:16 -04:00
Andreas Kling
f0ec104131 AK: Implement IPv6 host parsing in URLParser
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.

Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
2023-07-17 07:47:58 +02:00
Shannon Booth
5625ca5cb9 AK: Rename URLParser::parse to URLParser::basic_parse
To make it more clear that this function implements
'concept-basic-url-parser' instead of 'concept-url-parser'.
2023-07-15 09:45:16 +02:00
Valtteri Koskivuori
5e5493e334 AK: Add URLParser relative file URL test
I was debugging a different issue in Ladybird, and noticed that
completing relative file URLs with URL::complete_url didn't seem to work
right. This test case covers both the working https case, as well as the
file URL case fixed by the previous commit.
2023-06-18 15:16:08 +02:00
MacDue
5db1eb9961 AK+Everywhere: Replace URL::paths() with path_segment_at_index()
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
2023-04-15 06:37:04 +02:00
MacDue
35612c6a7f AK+Everywhere: Change URL::path() to serialize_path()
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.

The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
2023-04-15 06:37:04 +02:00
MacDue
8283e8b88c AK: Don't store parts of URLs percent decoded
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.

For example, previously if you had a URL like:

https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)

Creating URL("https:://foo.com/what%2F%2F").serialize() would return:

https://foo.com/what//

Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.

Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).

Seems to fix #13477 too
2023-04-12 07:40:22 +02:00
networkException
9915fa72fb AK+Everywhere: Use Optional for URLParser::parse's base_url parameter 2023-04-11 16:28:20 +02:00
Sam Atkins
abc01cc9fe AK+Tests+LibWeb: Make URL::complete_url() take a StringView
All it does is pass this to `URLParser::parse()` which takes a
StringView, so we might as well take one here too.
2023-02-15 12:48:26 -05:00
Thiago Henrique Hupner
401bc13776 AK: Use base URL when the specified URL is empty 2023-01-06 13:59:17 -07:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Andreas Kling
287a9b552a AK: Fix bad parsing of some file:/// URLs with base URL
We were dropping the base URL path components in the resulting URL due
to mistakenly determining the input URL to start with a Windows drive
letter. Fix this, add a spec link, and a test.
2022-09-20 15:38:53 +02:00
sin-ack
604aac531c AK+Userland+Tests: Remove URL(char const*) constructor
The StringView(char const*) constructor is being removed, and there was
only a few users of this left, which are also cleaned up in this commit.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Andreas Kling
f2663f477f AK: Ignore whitespace while decoding base64
This matches how other implementations behave.

1% progression on ACID3. :^)
2022-02-25 19:54:13 +01:00
Idan Horowitz
d6cfa34667 AK: Make URL::m_port an Optional<u16>, Expose raw port getter
Our current way of signalling a missing port with m_port == 0 was
lacking, as 0 is a valid port number in URLs.
2021-09-14 00:14:45 +02:00
TheFightingCatfish
4e8e1b7b3a AK: Improve the parsing of data urls
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
2021-08-06 10:45:17 +02:00
Max Wipfli
99d5555134 AK: Do not trim away non-ASCII bytes when parsing URL
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.

This also adds a test case with a URL containing non-ASCII code points.
2021-06-05 10:53:31 +02:00
Andreas Kling
c0d1a75881 AK: Strip leading/trailing C0-control-or-space in URLs correctly
We have to stop scanning once we hit a non-strippable character.
Add some tests to cover this.
2021-06-01 13:22:04 +02:00
Max Wipfli
c7857f3572 Tests: Add more tests for AK::URL
This adds more tests for AK::URL. Furthermore, this also changes some
tests to conform to what the reworked URL class does (and the URL
specification mostly expects).
2021-06-01 09:28:05 +02:00
Brian Gianforcaro
67322b0702 Tests: Move AK tests to Tests/AK 2021-05-06 17:54:28 +02:00
Renamed from AK/Tests/TestURL.cpp (Browse further)