Commit graph

40 commits

Author SHA1 Message Date
Timothy Flynn
bfc9dc447f AK+LibWeb: Replace our home-grown base64 encoder/decoders with simdutf
We currently have 2 base64 coders: one in AK, another in LibWeb for a
"forgiving" implementation. ECMA-262 has an upcoming proposal which will
require a third implementation.

Instead, let's use the base64 implementation that is used by Node.js and
recommended by the upcoming proposal. It handles forgiving decoding as
well.

Our users of AK's implementation should be fine with the forgiving
implementation. The AK impl originally had naive forgiving behavior, but
that was removed solely for performance reasons.

Using http://mattmahoney.net/dc/enwik8.zip (100MB unzipped) as a test,
performance of our old home-grown implementations vs. the simdutf
implementation (on Linux x64):

                Encode    Decode
AK base64       0.226s    0.169s
LibWeb base64   N/A       1.244s
simdutf         0.161s    0.047s
2024-07-16 10:27:39 +02:00
Timothy Flynn
7e38653492 AK: Reject invalid Base64 encoded string lengths 2024-03-25 08:13:27 +01:00
Timothy Flynn
4ecf4c7617 AK: Compute the exact size of decoded Base64 strings 2024-03-25 08:13:27 +01:00
Timothy Flynn
754ff41b9c AK: Remove whitespace skipping feature from AK's Base64 decoder
This was added in commit f2663f477f as a
partial implementation of what is now LibWeb's forgiving Base64 decoder.
All use cases within LibWeb that require whitespace skipping now use
that implementation instead.

Removing this feature from AK allows us to know the exact output size of
a decoded Base64 string. We can still trim whitespace at the start and
end of the input though; for example, this is useful when reading from a
file that may have a newline at the end of the file.
2024-03-25 08:13:27 +01:00
Timothy Flynn
690db10463 AK: Convert Base64 template parameters to regular function parameters
The generated function name is otherwise very long, which makes stack
traces a bit more difficult to sift through.
2024-03-25 08:13:27 +01:00
Timothy Flynn
f292746134 AK: Convert some west-consts to east-const in Base64.cpp
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
2024-03-25 08:13:27 +01:00
Timothy Flynn
81ad6de41b AK: Avoid creating an intermediate buffer when decoding a Base64 string
There's no need to copy the result. We can also avoid increasing the
size of the output buffer by 1 for each written byte.

This reduces the runtime of `./bin/base64 -d enwik8.base64 >/dev/null`
from 0.917s to 0.632s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Timothy Flynn
0fd7ad09a0 AK: Avoid StringBuilder when creating a Base64-encoded string
We don't really need the features provided by StringBuilder here, since
we know the exact size of the output. Avoiding StringBuilder avoids the
recurring capacity/size checks both within StringBuilder itself and its
internal ByteBuffer.

This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
0.976s to 0.428s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Timothy Flynn
5f5b8ee9bb AK: Do not perform UTF-8 validation on Base64-encoded strings
We know we are only appending ASCII characters to the StringBuilder, so
do not bother validating the result.

This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
1.192s to 0.976s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Andrew Kaster
e9b16970fe AK: Add base64url encoding and decoding methods
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.

The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
2024-03-20 12:18:57 -04:00
Muhammad Zahalqa
0f0d16bbec AK: Include Array.h in Base64.h
Array.h should be included in Base64.h and removed from Base64.cpp
2023-05-18 22:49:02 +02:00
Arda Cinar
283187afc5 AK+LibWeb: Move decode forgiving base64 under Web::Infra namespace
Since the forgiving base64 is part of the web infra standard
2023-01-10 17:54:01 +00:00
Arda Cinar
4ab2954210 AK: Expose Base64 tables from Base64.h
This change is necessary to move the forgiving base64 decoder to LibWeb
2023-01-10 17:54:01 +00:00
Arda Cinar
bbaf86fb46 AK: Add a forgiving_base64_decode helper
According to the specification at
https://infra.spec.whatwg.org/#forgiving-base64
2022-12-28 21:15:02 +01:00
Jelle Raaijmakers
25f2e4981c AK: Stop using DeprecatedString in Base64 encoding 2022-12-20 10:34:19 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Lenny Maiorani
5b59375a56 AK: Fix implicit and narrowing conversions in Base64 2022-03-16 16:19:53 +00:00
Lenny Maiorani
8d1d4d4f09 AK: Make static constexpr variables to avoid stack copy in Base64
Alphabet and lookup table are created and copied to the stack on each
call. Create them and store them in static memory.
2022-03-16 16:19:53 +00:00
Andreas Kling
f2663f477f AK: Ignore whitespace while decoding base64
This matches how other implementations behave.

1% progression on ACID3. :^)
2022-02-25 19:54:13 +01:00
Sam Atkins
c388a879d7 AK+Userland: Make AK::decode_base64 return ErrorOr 2022-01-24 22:36:09 +01:00
Sam Atkins
45cf40653a Everywhere: Convert ByteBuffer factory methods from Optional -> ErrorOr
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
2022-01-24 22:36:09 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Ben Wiederhake
cb868cfa41 AK+Everywhere: Make Base64 decoding fallible 2021-10-23 19:16:40 +01:00
Ben Wiederhake
3bf1f7ae87 AK: Don't crash on invalid Base64 input
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.

Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
2021-10-23 19:16:40 +01:00
Ali Mohammad Pur
97e97bccab Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe 2021-09-06 01:53:26 +02:00
Ali Mohammad Pur
50349de38c Meta: Disable -Wmaybe-uninitialized
It's prone to finding "technically uninitialized but can never happen"
cases, particularly in Optional<T> and Variant<Ts...>.
The general case seems to be that it cannot infer the dependency
between Variant's index (or Optional's boolean state) and a particular
alternative (or Optional's buffer) being untouched.
So it can flag cases like this:
```c++
if (index == StaticIndexForF)
    new (new_buffer) F(move(*bit_cast<F*>(old_buffer)));
```
The code in that branch can _technically_ make a partially initialized
`F`, but that path can never be taken since the buffer holding an
object of type `F` and the condition being true are correlated, and so
will never be taken _unless_ the buffer holds an object of type `F`.

This commit also removed the various 'diagnostic ignored' pragmas used
to work around this warning, as they no longer do anything.
2021-06-09 23:05:32 +04:30
Idan Horowitz
3bc3a7a23a AK: Use calculate_base64_encoded_length in encode_base64
We were accidentally calling calculate_base64_decoded_length instead,
which resulted in extra allocations during the StringBuilder::append
calls that can be avoided.
2021-05-22 08:54:32 +04:30
Gunnar Beutner
56ee4a1af2 AK: Silence -Wmaybe-uninitialized warning
Adding -fno-semantic-interposition to the GCC command
line caused this new warning.

I don't see how output.data() could be uninitialized here. Also,
commenting out the ensure_capacity() call for the Vector
also gets rid of this warning.
2021-05-03 08:42:39 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
William McPherson
2479ead718 Everywhere: Remove unnecessary clang-format offs
Mostly due to the fact that clang-format allows aligned comments via
AlignTrailingComments.

We could also use raw string literals in inline asm, which clang-format
deals with properly (and would be nicer in a lot of places).
2021-03-04 11:01:48 +01:00
BenJilks
29ada654b1 AK: Fix base64 decoding '/'
When creating the lookup table, it wouldn't add the last
character
2020-11-22 16:07:00 +01:00
Lenny Maiorani
2983215fb1 Base64: Pre-allocate size of input and output
Problem:
- Output of decode and encode grow as the decode and encode
  happen. This is inefficient because a large size will require many
  reallocations.
- `const` qualifiers are missing on variables which are not intended
  to change.

Solution:
- Since the size of the decoded or encoded message is known prior to
  starting, calculate the size and set the output to that size
  immediately. All appends will not incur the reallocation overhead.
- Add `const` qualifiers to show intent.
2020-10-13 23:59:46 +02:00
Lenny Maiorani
626bb1be9c Base64: constexpr initialization of alphabet and lookup table
Problem:
- The Base64 alphabet and lookup table are initialized at
  run-time. This results in an initial start-up cost as well as a
  boolean evaluation and branch every time the function is called.

Solution:
- Provide `constexpr` functions which initialize the alphabet and
  lookup table at compile-time. These can be called and assigned to a
  `constexpr` variable so that there is no run-time cost associated
  with the initialization or lookup.
2020-10-13 18:33:21 +02:00
Ben Wiederhake
5fe6ca75ca AK: Mark compilation-unit-only functions as static
This enables a nice warning in case a function becomes dead code. Also, add forgotten
header to Base64.cpp, which would cause an issue later when we enable -Wmissing-declarations.
2020-08-12 20:40:59 +02:00
asynts
abe925e4b0 AK: Change the signature of AK::encode_base64() to use Span. 2020-07-27 19:58:09 +02:00
Nico Weber
5ba8aba197 AK: Make encode_base64 take a ByteBuffer and return a String
That makes the interface symmetric with decode_base64 and it's
what all current callers want (except for one, which is buggy).
2020-07-22 19:22:00 +02:00
Tom Lebreux
79529ffd47 AK: Add a simple and inefficient Base64 encoder
The test cases are taken from RFC 4648.
2020-06-18 23:21:41 +02:00
Andreas Kling
50c1eca9d4 AK: Add a simple and inefficient Base64 decoder 2020-04-26 22:57:00 +02:00