We don't really need the features provided by StringBuilder here, since
we know the exact size of the output. Avoiding StringBuilder avoids the
recurring capacity/size checks both within StringBuilder itself and its
internal ByteBuffer.
This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
0.976s to 0.428s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
We know we are only appending ASCII characters to the StringBuilder, so
do not bother validating the result.
This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
1.192s to 0.976s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.
The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.
Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
It's prone to finding "technically uninitialized but can never happen"
cases, particularly in Optional<T> and Variant<Ts...>.
The general case seems to be that it cannot infer the dependency
between Variant's index (or Optional's boolean state) and a particular
alternative (or Optional's buffer) being untouched.
So it can flag cases like this:
```c++
if (index == StaticIndexForF)
new (new_buffer) F(move(*bit_cast<F*>(old_buffer)));
```
The code in that branch can _technically_ make a partially initialized
`F`, but that path can never be taken since the buffer holding an
object of type `F` and the condition being true are correlated, and so
will never be taken _unless_ the buffer holds an object of type `F`.
This commit also removed the various 'diagnostic ignored' pragmas used
to work around this warning, as they no longer do anything.
We were accidentally calling calculate_base64_decoded_length instead,
which resulted in extra allocations during the StringBuilder::append
calls that can be avoided.
Adding -fno-semantic-interposition to the GCC command
line caused this new warning.
I don't see how output.data() could be uninitialized here. Also,
commenting out the ensure_capacity() call for the Vector
also gets rid of this warning.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
Mostly due to the fact that clang-format allows aligned comments via
AlignTrailingComments.
We could also use raw string literals in inline asm, which clang-format
deals with properly (and would be nicer in a lot of places).
Problem:
- Output of decode and encode grow as the decode and encode
happen. This is inefficient because a large size will require many
reallocations.
- `const` qualifiers are missing on variables which are not intended
to change.
Solution:
- Since the size of the decoded or encoded message is known prior to
starting, calculate the size and set the output to that size
immediately. All appends will not incur the reallocation overhead.
- Add `const` qualifiers to show intent.
Problem:
- The Base64 alphabet and lookup table are initialized at
run-time. This results in an initial start-up cost as well as a
boolean evaluation and branch every time the function is called.
Solution:
- Provide `constexpr` functions which initialize the alphabet and
lookup table at compile-time. These can be called and assigned to a
`constexpr` variable so that there is no run-time cost associated
with the initialization or lookup.
This enables a nice warning in case a function becomes dead code. Also, add forgotten
header to Base64.cpp, which would cause an issue later when we enable -Wmissing-declarations.