Commit graph

490 commits

Author SHA1 Message Date
Sam Atkins
3f10a5701d AK: Add Utf8View::for_each_split_view() method
Returns one Utf8View at a time, using a callback function to identify
code points to split on.
2024-11-15 23:18:29 +01:00
Sam Atkins
ec5101a1d3 AK: Ensure empty StringViews all compare as equal
Before this change, a StringView with a character-data pointer would
never compare as equal to one with a null pointer, even if they were
both length 0. This could happen for example if one is
default-initialized, and the other is created as a substring.
2024-11-15 23:18:29 +01:00
Andrew Kaster
1ea236e454 AK: Skip test for StringView's CxxSequence conformance for now
This should be fixed on swiftlang/swift main later this week.
2024-11-15 10:51:45 -07:00
stasoid
31bf40b659 AK: Make LexicalPath::relative_path() fallible 2024-11-09 12:42:27 -07:00
Pavel Shliak
672590c360 Tests: Remove duplicated test for FloatingPointParsing 2024-11-06 09:22:44 +00:00
Pavel Shliak
cf5521ec68 Tests: Remove duplicated test for StringView 2024-11-06 09:22:44 +00:00
Timothy Flynn
cfcb29bdfd AK+LibUnicode: Add a method to trim non-ASCII whitespace from a String
Required by WebDriver.
2024-11-03 20:42:46 -05:00
Jelle Raaijmakers
c7773d0312 Meta: Update my email address everywhere 2024-11-01 12:14:53 +01:00
Jonne Ransijn
2457118024 AK: Add template specializations for Optional<{,Fly}String>
Slice the size of `Optional<{,Fly}String>` in half by introducing
`UINTPTR_MAX` as an invalid bit pattern for these values.
2024-10-31 23:26:22 +01:00
Andreas Kling
073bcfd386 AK+LibWeb: Add {Fly,}String::to_ascii_{upper,lower}_case()
These don't have to worry about the input not being valid UTF-8 and
so can be infallible (and can even return self if no changes needed.)

We use this instead of Infra::to_ascii_{upper,lower}_case in LibWeb.
2024-10-14 20:47:35 +02:00
Andreas Kling
dd419b5a8d AK: Make String::number() infallible
This API will always succeed in creating a String representing the
provided number in base-10.
2024-10-14 20:47:35 +02:00
Jelle Raaijmakers
f88acedc8f AK: Remove unused floating point conversion code
Currently I don't expect this code to be ever used in Ladybird.
2024-10-08 19:02:51 +02:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere
Some checks are pending
CI / Lagom (false, FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, macos-14, macOS, Clang) (push) Waiting to run
CI / Lagom (false, NO_FUZZ, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (true, NO_FUZZ, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Push notes / build (push) Waiting to run
2024-10-04 13:19:50 +02:00
Timothy Flynn
7b3b608caf AK: Do not coerce i64 and u64 values to i32 and u32
First, this isn't actually helpful, as we no longer store 32-bit values
in JsonValue. They are stored as 64-bit values anyways.

But more imporatantly, there was a bug here when trying to coerce an i64
to an i32. All negative values were cast to an i32, without checking if
the value is below NumericLimits<i32>::min.
2024-09-27 09:46:55 +01:00
Asutosh Variar
229b64a4b7 Everywhere: Convert from_string_view -> from_string_literal where static 2024-09-11 10:59:04 +01:00
Timothy Flynn
d265575269 AK: Add a Base64 decoder to decode into an existing buffer
Some callers (LibJS) will want to control the size of the output buffer,
to decode up to a maximum length. They will also want to receive partial
results in the case of an error. This patch adds a method to provide
those capabilities, and makes the existing implementation use it.
2024-09-03 17:43:03 +02:00
Timothy Flynn
41e14e3fc3 AK: Add an option to the base64 encoder to omit padding
Will be used by an upcoming JS prototype
2024-09-03 17:43:03 +02:00
Andrew Kaster
782926601d Tests: Convert Swift tests to use Testing module where possible
The AK tests can't seem to use it because it crashes the frontend :)
2024-08-28 21:27:35 -06:00
Andrew Kaster
315a666e53 Tests: Add test to verify CxxSequence protocol conformance of containers
Building the test in debug mode currently crashes the swift frontend,
so we'll need to build this in release mode until that's fixed.
2024-08-17 17:44:37 -06:00
Andrew Kaster
756ef2c722 AK: Conform SimpleIterator to the random access iterator requirements
This requires pulling in some of the STL, but the result is that our
iterator is now STL Approved ™️ and our containers can be
auto-conformed to Swift protocols.
2024-08-17 17:44:37 -06:00
Timothy Flynn
d57d14fc19 AK: Define FloatingPointExponentialForm comparator in the AK namespace
This isn't an issue now because this is only invoked from a macro that
is expanded within this file. But in an upcoming commit, it will be
invoked from a helper function in the Test namespace. At that point, the
compiler complains about the comparitor not being found (and helpfully
indicates we should move this one to the AK namespace to allow ADL to
succeed).
2024-08-13 14:11:05 +02:00
Timothy Flynn
831e5ed4e2 AK: Allow comparing spans of different constness
Otherwise, the following code would not compile:

    constexpr Array<int, 3> array { 4, 5, 6 };
    Vector<int> vector { 4, 5, 6 };

    if (array == vector.span()) { }

We do such comparisons in tests quite a bit. But it currently doesn't
become an issue because of the way EXPECT_EQ copies its input parameters
to non-const locals. In a future patch, that copying will be removed,
and the compiler would otherwise complain about not finding a suitable
comparison operator.
2024-08-13 14:11:05 +02:00
Shannon Booth
b3bf5c4ea8 AK: Add BOM handling to String::from_utf8_with_replacement_character 2024-08-12 06:38:58 -04:00
Shannon Booth
033ea0e7fb AK: Add String::from_utf8_with_replacement_character
This takes a byte sequence and converts it to a UTF-8 string with the
replacement character.
2024-08-10 10:39:43 +02:00
Timothy Flynn
7a17c654d2 AK: Add a method to compute UTF-16 length from a UTF-8 string 2024-07-31 05:55:34 -04:00
Timothy Flynn
74d644a216 AK: Explicitly check for null data in Utf16View
The underlying CPU-specific instructions for operating on UTF-16 strings
behave differently for null inputs. Add an explicit check for this state
for consistency.
2024-07-21 19:57:07 +02:00
Timothy Flynn
144452d638 AK: Explicitly check for null data in Utf8View
The underlying CPU-specific instructions for operating on UTF-8 strings
behave differently for null inputs. Add an explicit check for this state
for consistency.
2024-07-21 19:57:07 +02:00
Timothy Flynn
71c29504af AK: Support non-native endianness in Utf16View
Utf16View currently assumes host endianness. Add support for specifying
either big or little endianness (which we mostly just pipe through to
simdutf). This will allow using simdutf facilities with LibTextCodec.
2024-07-18 19:43:57 +02:00
Timothy Flynn
0c14a9417a AK: Replace converting to and from UTF-16 with simdutf
The one behavior difference is that we will now actually fail on invalid
code units with Utf16View::to_utf8(AllowInvalidCodeUnits::No). It was
arguably a bug that this wasn't already the case.
2024-07-18 14:46:25 +02:00
Andrew Kaster
88044f59c6 AK: Stop exporting AK::FixedPoint into the global namespace
This declaration has conflicts with the macOS SDK, which becomes a
problem when trying to interact with system clang modules.
2024-07-18 09:43:38 +01:00
Andrew Kaster
bf600c8e1d AK: Stop exporting AK::Duration into the global namespace
This has conflicts with MacTypes.h from the Apple macOS SDKs, which
becomes a huge problem when trying to interact with system clang modules
2024-07-18 09:43:38 +01:00
Timothy Flynn
bfc9dc447f AK+LibWeb: Replace our home-grown base64 encoder/decoders with simdutf
We currently have 2 base64 coders: one in AK, another in LibWeb for a
"forgiving" implementation. ECMA-262 has an upcoming proposal which will
require a third implementation.

Instead, let's use the base64 implementation that is used by Node.js and
recommended by the upcoming proposal. It handles forgiving decoding as
well.

Our users of AK's implementation should be fine with the forgiving
implementation. The AK impl originally had naive forgiving behavior, but
that was removed solely for performance reasons.

Using http://mattmahoney.net/dc/enwik8.zip (100MB unzipped) as a test,
performance of our old home-grown implementations vs. the simdutf
implementation (on Linux x64):

                Encode    Decode
AK base64       0.226s    0.169s
LibWeb base64   N/A       1.244s
simdutf         0.161s    0.047s
2024-07-16 10:27:39 +02:00
Dennis Camera
b54a1c6284 AK: Implement ShortString for big-endian 2024-07-05 09:49:23 -06:00
Timothy Flynn
698a95d2de AK: Decode paired UTF-16 surrogates in a JSON string
For example, such use is seen on Twitter.
2024-07-04 14:16:16 +02:00
Zaggy1024
bbd8a218a5 AK: Prevent overflow of the min when clamping unsigned values to signed
Also, add some tests for the cases that were broken before.
2024-06-24 12:41:32 -06:00
Zaggy1024
172f4588a7 Tests/AK: Add some quick tests for AK::clamp_to 2024-06-24 12:41:32 -06:00
Timothy Flynn
5cf818e305 LibUnicode: Replace case transformations and comparison with ICUs
There are a couple of differences here due to using ICU:

1. Titlecasing behaves slightly differently. We previously transformed
   "123dollars" to "123Dollars", as we would use word segmentation to
   split a string into words, then transform the first cased character
   to titlecase. ICU doesn't go quite that far, and leaves the string
   as "123dollars". While this is a behavior change, the only user of
   this API is the `text-transform: capitalize;` CSS rule, and we now
   match the behavior of other browsers.

2. There isn't an API to compare strings with case insensitivity without
   allocating case-folded strings for both the left- and right-hand-side
   strings. Our implementation was previously allocation-free; however,
   in a benchmark, ICU is still ~1.4x faster.
2024-06-20 10:59:55 +02:00
Andreas Kling
b88e0eb50a AK: Remove unused Complex.h 2024-06-18 12:00:14 +02:00
Andreas Kling
fe1aec124e AK: Remove unused ArbitrarySizedEnum class 2024-06-18 12:00:14 +02:00
Diego
7560b640f3 AK: Add AllowSurrogates to UTF-8 validator
The [UTF-8](https://datatracker.ietf.org/doc/html/rfc3629#page-5)
standard says to reject strings with upper or lower surrogates. However,
in many standards, ECMAScript included, unpaired surrogates (and
therefore UTF-8 surrogates) are allowed in strings. So, this commit
extends the UTF-8 validation API with `AllowSurrogates`, which will
reject upper and lower surrogate characters.
2024-06-09 12:16:32 +02:00
Daniel Bertalan
376b956214 Tests: Stop invoking UB in AK::NeverDestroyed's tests
Instead of attempting a stack use-after-free by reading an out-of-scope
object's data member, let's keep a flag that checks if the destructor
had been called in the outer scope.

Fixes #64
2024-06-05 17:19:14 -06:00
Andreas Kling
6321e97b09 AK: Remove various unused things 2024-06-04 09:19:39 +02:00
Timothy Flynn
fe3fde2411 AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset
The existing String::find_byte_offset is case-sensitive. This variant
allows performing searches using Unicode-aware case folding.
2024-06-01 07:37:54 +02:00
Tim Ledbetter
817bfef3aa Tests/AK: Add tests for integral log2 2024-05-21 09:31:17 +02:00
Tim Ledbetter
d0d81e470e AK: Fix off by one error in integral ceil_log2()
Previously, certain values of `ceil_log2(x)` would be 1 smaller than
`ceil(log2(x))`.
2024-05-21 09:31:17 +02:00
Abuneri
b5bed37074 AK: Replace FP math in is_power_of with a purely integral algorithm
The previous naive approach was causing test failures because of
rounding issues in some exotic environments. In particular, MSVC
via MSBuild
2024-05-07 16:43:34 -06:00
Nico Weber
88d0702763 AK: Make ceil_div() handle one argument being negative correctly
`ceil_div(-1, 2)` used to return -1.
Now it returns 0, which is the correct ceil(-0.5).

(C++'s division semantics have floor semantics for numbers > 0,
but ceil semantics for numbers < 0.)

This will be important for the JPEG2000 decoder eventually.
2024-04-27 07:09:08 +02:00
Nico Weber
f2ebad11a8 Tests/AK: Add some basic ceil_div() tests 2024-04-27 07:09:08 +02:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
dgaston
08aaf4fb07 AK: Add methods to BufferedStream to resize the user supplied buffer
These changes allow lines of arbitrary length to be read with
BufferedStream. When the user supplied buffer is smaller than
the line, it will be resized to fit the line. When the internal
buffer in BufferedStream is smaller than the line, it will be
read into the user supplied buffer chunk by chunk with the
buffer growing accordingly.

Other behaviors match the behavior of the existing read_line method.
2024-04-21 11:46:55 +02:00