Commit graph

32 commits

Author SHA1 Message Date
Tim Schumacher
127f6ed6eb LibCompress: Fix a typo in m_read_final_block 2023-10-09 23:40:10 +02:00
Nico Weber
6d38824985 LibCompress: Tolerate more than 288 entries in CanonicalCode
Webp lossless can have up to 2328 symbols. This code assumed the deflate
max of 288, leading to crashes for webp lossless files using more than
288 symbols (such as Tests/LibGfx/test-inputs/simple-vp8l.webp).

Nothing writes webp files at this point, so the m_bit_codes and
m_bit_code_lengths arrays aren't ever used in practice with more than
288 entries.
2023-04-07 20:49:39 +02:00
Nico Weber
85d0637058 LibCompress: Make CanonicalCode::from_bytes() return ErrorOr<>
No intended behavior change.
2023-04-02 06:19:46 +02:00
Timothy Flynn
9f238793e0 gunzip+LibCompress: Increase buffer sizes used by Deflate and gunzip
Co-authored-by: Andreas Kling <kling@serenityos.org>
2023-03-31 06:56:11 +02:00
Timothy Flynn
7447a91d7e LibCompress: Decode non-self-referencing back-references in one shot
We currently decode back-references one byte at a time, while writing
that byte back out to the output buffer. This is only necessary when the
back-reference refers to itself, i.e. when the back-reference distance
is less than its length. In other cases, we can read the entire back-
reference block in one shot.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:

    5.8s to 4.89s on Serenity (cold)
    2.3s to 1.72s on Serenity (warm)
    1.6s to 1.06s on Linux
2023-03-29 13:22:11 +01:00
Timothy Flynn
5aaefe4e62 LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long
Huffman codes have a useful property in that they are prefix codes. That
is, a set of bits representing a Huffman-coded symbol is never a prefix
of another symbol. This allows us to create a table, where each index in
the table are integers whose prefix is the entry's corresponding Huffman
code.

With Deflate, we can have codes up to 16 bits in length, thus creating a
prefix table with 2^16 entries. So instead of creating a table fit all
possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits
fall back to the binary search method.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from 3.527s to 2.585s on Linux.
2023-03-29 07:19:14 +02:00
Timothy Flynn
20aaab47f9 LibCompress: Use a bit stream for the entire GZIP decompression process
We currently mix normal and bit streams during GZIP decompression, where
the latter is a wrapper around the former. This isn't causing issues now
as the underlying bit stream buffer is a byte, so the normal stream can
pick up where the bit stream left off.

In order to increase the size of that buffer though, the normal stream
will not be able to assume it can resume reading after the bit stream.
The buffer can easily contain more bits than it was meant to read, so
when the normal stream resumes, there may be N bits leftover in the bit
stream that the normal stream was meant to read.

To avoid weird behavior when mixing streams, this changes the GZIP
decompressor to always read from a bit stream.
2023-03-29 07:19:14 +02:00
Tim Schumacher
d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Tim Schumacher
43f98ac6e1 Everywhere: Remove the AK:: qualifier from Stream usages 2023-02-13 00:50:07 +00:00
Tim Schumacher
874c7bba28 LibCore: Remove Stream.h 2023-02-13 00:50:07 +00:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
8464da1439 AK: Move Stream and SeekableStream from LibCore
`Stream` will be qualified as `AK::Stream` until we remove the
`Core::Stream` namespace. `IODevice` now reuses the `SeekMode` that is
defined by `SeekableStream`, since defining its own would require us to
qualify it with `AK::SeekMode` everywhere.
2023-01-29 19:16:44 -07:00
Tim Schumacher
5f2ea31816 AK: Move Handle from LibCore and name it MaybeOwned
The new name should make it abundantly clear what it does.
2023-01-29 19:16:44 -07:00
Tim Schumacher
c4da1be32c LibCompress: Remove all leftover AK::Stream headers 2023-01-13 17:34:45 -07:00
Tim Schumacher
46a53dc6e0 LibCompress: Switch the deflate seekback buffer to CircularBuffer 2023-01-13 17:34:45 -07:00
Tim Schumacher
d23f0a7405 LibCompress: Switch DeflateDecompressor to a fallible constructor
We don't have anything fallible in there yet, but we will soon switch
the seekback buffer to the new `CircularBuffer`, which has a fallible
constructor.

We have to do the same for the internal `GzipDecompressor::Member`
class, as it needs to construct a `DeflateCompressor` from its received
stream.
2023-01-13 17:34:45 -07:00
Tim Schumacher
f4afee4278 LibCompress: Switch DeflateCompressor to a fallible constructor 2023-01-10 10:28:26 +01:00
Tim Schumacher
8cd2cf2b77 LibCompress: Port DeflateCompressor to Core::Stream 2023-01-10 10:28:26 +01:00
Tim Schumacher
0bdbe27d6b LibCore: Rename InputBitStream.h to BitStream.h
We won't just be defining readable streams here from now on, but also
writable streams.
2023-01-10 10:28:26 +01:00
Tim Schumacher
30abd47099 LibCompress: Port DeflateDecompressor to Core::Stream 2022-12-12 16:21:39 +00:00
Lenny Maiorani
9afc7d5379 LibCompress: Change DeflateSpecialCodeLengths to constexpr variables 2022-04-03 17:36:48 +01:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Andreas Kling
80d4e830a0 Everywhere: Pass AK::ReadonlyBytes by value 2021-11-11 01:27:46 +01:00
Linus Groh
649d2faeab Everywhere: Use "the SerenityOS developers." in copyright headers
We had some inconsistencies before:

- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."

I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.

By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
2021-04-29 00:59:26 +02:00
Idan Horowitz
1c512a702a AK+Userland: Use idan.horowitz@serenityos.org for my copyright headers 2021-04-22 22:42:38 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Idan Horowitz
a955fd4156 LibCompress+AK: Propagate error handling to wrapped streams
This ensures that when a DeflateCompressor stream is cleared of any
errors its underlying wrapped streams (InputBitStream/InputMemoryStream)
will be cleared as well and wont fail a VERIFY on destruction.
2021-03-15 21:35:48 +01:00
Idan Horowitz
02b4cb96f8 LibCompress: Decrease CanonicalCode's size on stack
This commit stores the bit codes as u16s instead of u32s as the
maximum code bit length in DEFLATE is 15.
2021-03-14 14:52:21 +01:00
Idan Horowitz
7e587a615e LibCompress: Handle literal only lz77 streams in DeflateCompressor
Very incompressible data could sometimes produce no backreferences
which would result in no distance huffman code being created (as it
was not needed), so VERIFY the code exists only if it is actually
needed for writing the stream.
2021-03-14 11:05:35 +01:00
Idan Horowitz
b1e3176f9f LibCompress: Replace goto with simple recursion in DeflateCompressor
This is just a bit easier on the eyes :^)
2021-03-13 23:50:07 +01:00
Idan Horowitz
bcbfa7db62 LibCompress: Implement DEFLATE compression
This commit adds a fully functional DEFLATE compression
implementation that can be used to implement compression
for higher level formats like gzip, zlib or zip.

A large part of this commit is based on Hans Wennborg's
great article about the DEFLATE and zip specifications:
https://www.hanshq.net/zip.html
2021-03-13 20:07:25 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Renamed from Libraries/LibCompress/Deflate.h (Browse further)