Commit graph

478 commits

Author SHA1 Message Date
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere
Some checks are pending
CI / Lagom (false, FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, macos-14, macOS, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (true, NO_FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Push notes / build (push) Waiting to run
2024-10-04 13:19:50 +02:00
Timothy Flynn
7b3b608caf AK: Do not coerce i64 and u64 values to i32 and u32
First, this isn't actually helpful, as we no longer store 32-bit values
in JsonValue. They are stored as 64-bit values anyways.

But more imporatantly, there was a bug here when trying to coerce an i64
to an i32. All negative values were cast to an i32, without checking if
the value is below NumericLimits<i32>::min.
2024-09-27 09:46:55 +01:00
Asutosh Variar
229b64a4b7 Everywhere: Convert from_string_view -> from_string_literal where static 2024-09-11 10:59:04 +01:00
Timothy Flynn
d265575269 AK: Add a Base64 decoder to decode into an existing buffer
Some callers (LibJS) will want to control the size of the output buffer,
to decode up to a maximum length. They will also want to receive partial
results in the case of an error. This patch adds a method to provide
those capabilities, and makes the existing implementation use it.
2024-09-03 17:43:03 +02:00
Timothy Flynn
41e14e3fc3 AK: Add an option to the base64 encoder to omit padding
Will be used by an upcoming JS prototype
2024-09-03 17:43:03 +02:00
Andrew Kaster
782926601d Tests: Convert Swift tests to use Testing module where possible
The AK tests can't seem to use it because it crashes the frontend :)
2024-08-28 21:27:35 -06:00
Andrew Kaster
315a666e53 Tests: Add test to verify CxxSequence protocol conformance of containers
Building the test in debug mode currently crashes the swift frontend,
so we'll need to build this in release mode until that's fixed.
2024-08-17 17:44:37 -06:00
Andrew Kaster
756ef2c722 AK: Conform SimpleIterator to the random access iterator requirements
This requires pulling in some of the STL, but the result is that our
iterator is now STL Approved ™️ and our containers can be
auto-conformed to Swift protocols.
2024-08-17 17:44:37 -06:00
Timothy Flynn
d57d14fc19 AK: Define FloatingPointExponentialForm comparator in the AK namespace
This isn't an issue now because this is only invoked from a macro that
is expanded within this file. But in an upcoming commit, it will be
invoked from a helper function in the Test namespace. At that point, the
compiler complains about the comparitor not being found (and helpfully
indicates we should move this one to the AK namespace to allow ADL to
succeed).
2024-08-13 14:11:05 +02:00
Timothy Flynn
831e5ed4e2 AK: Allow comparing spans of different constness
Otherwise, the following code would not compile:

    constexpr Array<int, 3> array { 4, 5, 6 };
    Vector<int> vector { 4, 5, 6 };

    if (array == vector.span()) { }

We do such comparisons in tests quite a bit. But it currently doesn't
become an issue because of the way EXPECT_EQ copies its input parameters
to non-const locals. In a future patch, that copying will be removed,
and the compiler would otherwise complain about not finding a suitable
comparison operator.
2024-08-13 14:11:05 +02:00
Shannon Booth
b3bf5c4ea8 AK: Add BOM handling to String::from_utf8_with_replacement_character 2024-08-12 06:38:58 -04:00
Shannon Booth
033ea0e7fb AK: Add String::from_utf8_with_replacement_character
This takes a byte sequence and converts it to a UTF-8 string with the
replacement character.
2024-08-10 10:39:43 +02:00
Timothy Flynn
7a17c654d2 AK: Add a method to compute UTF-16 length from a UTF-8 string 2024-07-31 05:55:34 -04:00
Timothy Flynn
74d644a216 AK: Explicitly check for null data in Utf16View
The underlying CPU-specific instructions for operating on UTF-16 strings
behave differently for null inputs. Add an explicit check for this state
for consistency.
2024-07-21 19:57:07 +02:00
Timothy Flynn
144452d638 AK: Explicitly check for null data in Utf8View
The underlying CPU-specific instructions for operating on UTF-8 strings
behave differently for null inputs. Add an explicit check for this state
for consistency.
2024-07-21 19:57:07 +02:00
Timothy Flynn
71c29504af AK: Support non-native endianness in Utf16View
Utf16View currently assumes host endianness. Add support for specifying
either big or little endianness (which we mostly just pipe through to
simdutf). This will allow using simdutf facilities with LibTextCodec.
2024-07-18 19:43:57 +02:00
Timothy Flynn
0c14a9417a AK: Replace converting to and from UTF-16 with simdutf
The one behavior difference is that we will now actually fail on invalid
code units with Utf16View::to_utf8(AllowInvalidCodeUnits::No). It was
arguably a bug that this wasn't already the case.
2024-07-18 14:46:25 +02:00
Andrew Kaster
88044f59c6 AK: Stop exporting AK::FixedPoint into the global namespace
This declaration has conflicts with the macOS SDK, which becomes a
problem when trying to interact with system clang modules.
2024-07-18 09:43:38 +01:00
Andrew Kaster
bf600c8e1d AK: Stop exporting AK::Duration into the global namespace
This has conflicts with MacTypes.h from the Apple macOS SDKs, which
becomes a huge problem when trying to interact with system clang modules
2024-07-18 09:43:38 +01:00
Timothy Flynn
bfc9dc447f AK+LibWeb: Replace our home-grown base64 encoder/decoders with simdutf
We currently have 2 base64 coders: one in AK, another in LibWeb for a
"forgiving" implementation. ECMA-262 has an upcoming proposal which will
require a third implementation.

Instead, let's use the base64 implementation that is used by Node.js and
recommended by the upcoming proposal. It handles forgiving decoding as
well.

Our users of AK's implementation should be fine with the forgiving
implementation. The AK impl originally had naive forgiving behavior, but
that was removed solely for performance reasons.

Using http://mattmahoney.net/dc/enwik8.zip (100MB unzipped) as a test,
performance of our old home-grown implementations vs. the simdutf
implementation (on Linux x64):

                Encode    Decode
AK base64       0.226s    0.169s
LibWeb base64   N/A       1.244s
simdutf         0.161s    0.047s
2024-07-16 10:27:39 +02:00
Dennis Camera
b54a1c6284 AK: Implement ShortString for big-endian 2024-07-05 09:49:23 -06:00
Timothy Flynn
698a95d2de AK: Decode paired UTF-16 surrogates in a JSON string
For example, such use is seen on Twitter.
2024-07-04 14:16:16 +02:00
Zaggy1024
bbd8a218a5 AK: Prevent overflow of the min when clamping unsigned values to signed
Also, add some tests for the cases that were broken before.
2024-06-24 12:41:32 -06:00
Zaggy1024
172f4588a7 Tests/AK: Add some quick tests for AK::clamp_to 2024-06-24 12:41:32 -06:00
Timothy Flynn
5cf818e305 LibUnicode: Replace case transformations and comparison with ICUs
There are a couple of differences here due to using ICU:

1. Titlecasing behaves slightly differently. We previously transformed
   "123dollars" to "123Dollars", as we would use word segmentation to
   split a string into words, then transform the first cased character
   to titlecase. ICU doesn't go quite that far, and leaves the string
   as "123dollars". While this is a behavior change, the only user of
   this API is the `text-transform: capitalize;` CSS rule, and we now
   match the behavior of other browsers.

2. There isn't an API to compare strings with case insensitivity without
   allocating case-folded strings for both the left- and right-hand-side
   strings. Our implementation was previously allocation-free; however,
   in a benchmark, ICU is still ~1.4x faster.
2024-06-20 10:59:55 +02:00
Andreas Kling
b88e0eb50a AK: Remove unused Complex.h 2024-06-18 12:00:14 +02:00
Andreas Kling
fe1aec124e AK: Remove unused ArbitrarySizedEnum class 2024-06-18 12:00:14 +02:00
Diego
7560b640f3 AK: Add AllowSurrogates to UTF-8 validator
The [UTF-8](https://datatracker.ietf.org/doc/html/rfc3629#page-5)
standard says to reject strings with upper or lower surrogates. However,
in many standards, ECMAScript included, unpaired surrogates (and
therefore UTF-8 surrogates) are allowed in strings. So, this commit
extends the UTF-8 validation API with `AllowSurrogates`, which will
reject upper and lower surrogate characters.
2024-06-09 12:16:32 +02:00
Daniel Bertalan
376b956214 Tests: Stop invoking UB in AK::NeverDestroyed's tests
Instead of attempting a stack use-after-free by reading an out-of-scope
object's data member, let's keep a flag that checks if the destructor
had been called in the outer scope.

Fixes #64
2024-06-05 17:19:14 -06:00
Andreas Kling
6321e97b09 AK: Remove various unused things 2024-06-04 09:19:39 +02:00
Timothy Flynn
fe3fde2411 AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset
The existing String::find_byte_offset is case-sensitive. This variant
allows performing searches using Unicode-aware case folding.
2024-06-01 07:37:54 +02:00
Tim Ledbetter
817bfef3aa Tests/AK: Add tests for integral log2 2024-05-21 09:31:17 +02:00
Tim Ledbetter
d0d81e470e AK: Fix off by one error in integral ceil_log2()
Previously, certain values of `ceil_log2(x)` would be 1 smaller than
`ceil(log2(x))`.
2024-05-21 09:31:17 +02:00
Abuneri
b5bed37074 AK: Replace FP math in is_power_of with a purely integral algorithm
The previous naive approach was causing test failures because of
rounding issues in some exotic environments. In particular, MSVC
via MSBuild
2024-05-07 16:43:34 -06:00
Nico Weber
88d0702763 AK: Make ceil_div() handle one argument being negative correctly
`ceil_div(-1, 2)` used to return -1.
Now it returns 0, which is the correct ceil(-0.5).

(C++'s division semantics have floor semantics for numbers > 0,
but ceil semantics for numbers < 0.)

This will be important for the JPEG2000 decoder eventually.
2024-04-27 07:09:08 +02:00
Nico Weber
f2ebad11a8 Tests/AK: Add some basic ceil_div() tests 2024-04-27 07:09:08 +02:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
dgaston
08aaf4fb07 AK: Add methods to BufferedStream to resize the user supplied buffer
These changes allow lines of arbitrary length to be read with
BufferedStream. When the user supplied buffer is smaller than
the line, it will be resized to fit the line. When the internal
buffer in BufferedStream is smaller than the line, it will be
read into the user supplied buffer chunk by chunk with the
buffer growing accordingly.

Other behaviors match the behavior of the existing read_line method.
2024-04-21 11:46:55 +02:00
Hendiadyoin1
f95abe8c0e AK: Make BigIntBase more agnostic to non native word sizes
This will allow us to use it in Crypto::UnsignedBigInteger, which always
uses 32 bit words
2024-03-25 14:26:29 -06:00
Timothy Flynn
7e38653492 AK: Reject invalid Base64 encoded string lengths 2024-03-25 08:13:27 +01:00
Timothy Flynn
754ff41b9c AK: Remove whitespace skipping feature from AK's Base64 decoder
This was added in commit f2663f477f as a
partial implementation of what is now LibWeb's forgiving Base64 decoder.
All use cases within LibWeb that require whitespace skipping now use
that implementation instead.

Removing this feature from AK allows us to know the exact output size of
a decoded Base64 string. We can still trim whitespace at the start and
end of the input though; for example, this is useful when reading from a
file that may have a newline at the end of the file.
2024-03-25 08:13:27 +01:00
Dan Klishch
45a0ba2167 AK: Introduce AK::enumerate
Co-Authored-By: Tim Flynn <trflynn89@pm.me>
2024-03-23 09:02:58 -04:00
Andrew Kaster
e9b16970fe AK: Add base64url encoding and decoding methods
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.

The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
2024-03-20 12:18:57 -04:00
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00
Timothy Flynn
e4213f5767 AK: Generalize Span::contains_slow to use the Traits infrastructure
This allows, for example, checking if a Span<String> contains a value
without having to allocate a String.
2024-03-16 08:42:33 +01:00
Timothy Flynn
e3b5e24ce0 AK: Iterate the bytes of a URL query with an unsigned type
Otherwise, we percent-encode negative signed chars incorrectly. For
example, https://www.strava.com/login contains the following hidden
<input> field:

    <input name="utf8" type="hidden" value="✓" />

On submitting the form, we would percent-encode that field as:

    utf8=%-1E%-64%-6D

Which would cause us to receive an HTTP 500 response. We now properly
percent-encode that field as:

    utf8=%E2%9C%93

And can login to Strava :^)
2024-03-10 15:17:31 +01:00
Timothy Flynn
82ea53cf10 AK: Add a StringView method to count the number of lines in a string
We already have a helper to split a StringView by line while considering
"\n", "\r", and "\r\n". Add an analagous method to just count the number
of lines in the same manner.
2024-03-08 14:43:33 -05:00
Andrew Kaster
21ac431fac AK: Allow reading from EOF buffered streams better in read_line()
If the BufferedStream is able to fill its entire circular buffer in
populate_read_buffer() and is later asked to read a line or read until
a delimiter, it could erroneously return EMSGSIZE if the caller's buffer
was smaller than the internal buffer. In this case, all we really care
about is whether the caller's buffer is big enough for however much data
we're going to copy into it. Which needs to take into account the
candidate.
2024-02-26 13:16:27 -07:00
Nico Weber
986872800e Tests/AK: Add a test for the array ctor deduction guide 2024-02-11 18:53:00 +01:00
Nico Weber
d84b69ace9 AK: Add to_array()
This is useful if you want an array with an explicit type but still
want its size to be inferred.
2024-02-11 18:53:00 +01:00