Instead of checking the header in ZlibDecompressor::create(), we now
check it in read_some() when it is called for the first time. This
resolves a FIXME in the new DecompressionStream implementation.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
We previously skipped updating the lookback buffer when copying
uncompressed data, which resulted in a wrong total byte count.
With a wrong total byte count, our decompressor implementation
ended up choosing a wrong offset into the dictionary.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
Previously we would calculate the index of the first parent node as
heap.size() (which is initialized to non_zero_freqs), so in the edge
case in which all symbols had a non-zero frequency, we would use the
Size-index entry in the array for both the first symbol's leaf node,
and the first parent node.
The result would either be a non-optimal huffman code (bad), or an
illegal huffman code that would then go on to crash due to an error
check in CanonicalCode::from_bytes. (worse)
We now store parent nodes starting at heap.size() - 1, which eliminates
the potential overlap, and resolves the issue.
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.
Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
The actual cause for the "missing bits" is currently unknown, and this
test case doesn't actually start obviously breaking yet unless we start
reporting errors about missing bits. However, since we are touching the
BitStream implementation already, let's add the test early to make extra
sure that we aren't breaking anything.
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
The constructor is now only concerned with creating the required
streams, which means that it no longer fails for XZ streams with
invalid headers. Instead, everything is parsed and validated during the
first read, preparing us for files with multiple streams.
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
This is to differentiate between the upcoming `AllocatingMemoryStream`,
which automatically allocates memory as needed instead of operating on a
static memory area.