Commit graph

36 commits

Author SHA1 Message Date
Tim Schumacher
cb03d3d78f AK: Allow rejecting BitStream reads beyond EOF 2023-12-01 12:48:18 +01:00
Tim Schumacher
de49413bdf AK: Don't slice off the first byte of a BitStream read 2023-12-01 12:48:18 +01:00
Lucas CHOLLET
6f059c9d60 AK: Add the InputBitStream concept
This will allow users to abstract away the endianness of the stream they
are using.
2023-11-12 13:56:27 +01:00
kleines Filmröllchen
c9f6605fb2 AK: Account for bit position 8 in bit stream alignment
See identical code in LittleEndianBitStream; even in the bytewise
reading BigEndianBitStream an offset of 8 is not inconsistent state and
handled just fine by read_bits.
2023-05-18 22:23:15 +02:00
Timothy Flynn
70c977aa56 AK: Discard bits from LittleEndianInputBitStream as they are read
Rather than tracking our position in the bit buffer, we can simply shift
away the bits that we read. This is mostly for simplicity, but also does
help performance a bit.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    3.96s to 3.79s on Serenity (cold)
    1.08s to 1.04s on Serenity (warm)
    0.83s to 0.82s on Linux
2023-05-18 11:21:56 -07:00
Tim Schumacher
56d861ebe0 AK: Prevent bit counter underflows in the new BitStream
Our current `peek_bits` function allows retrieving more bits than we can
actually provide, so whenever someone discards the requested bit count
afterwards we were underflowing the value instead.
2023-05-04 20:01:16 +02:00
Timothy Flynn
eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Timothy Flynn
7319deb03d AK: Remove BitStream workaround for now-resolved BufferedStream behavior
The issue was that the buffer would only be filled if it was empty.
2023-03-30 08:47:22 +02:00
Nico Weber
e1f8443db0 AK: Fix build with Xcode 14.2's clang
Else:

    AK/BitStream.h:218:24:
        error: inline function '...::lsb_mask<unsigned char>' is not
               defined [-Werror,-Wundefined-inline]
        static constexpr T lsb_mask(T bits)
                           ^
2023-03-29 06:28:55 -04:00
Timothy Flynn
8e834d4bb2 AK: Increase LittleEndianInputBitStream's buffer size and remove loops
We current buffer one byte of data from the underlying stream. And when
we pull bits off that buffer, we do so 1 or 8 bits at a time (depending
on whether the buffer is byte aligned). The 1-bit-at-a-time loop is by
far the most common during e.g. GZIP decompression.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to extract the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:

    242s to 35s on Serenity
    11.125s to 3.527s on Linux

Note that BigEndianInputBitStream can also use the same techniques,
and some of the methods here may make sense to live in an endianness-
agnostic base class. The focus is GZIP right now though, which only
uses the little endian stream.
2023-03-29 07:19:14 +02:00
Tim Schumacher
e007279315 AK: Read and write accumulated BitStream bits directly 2023-03-13 15:16:20 +00:00
Tim Schumacher
b623dda765 AK: Remove unneeded overrides for write_until_depleted from BitStream 2023-03-13 15:16:20 +00:00
Tim Schumacher
ecd1862859 AK: Rename Stream::write_entire_buffer to Stream::write_until_depleted
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher
d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Tim Schumacher
8b2f23d016 AK: Remove the fallible constructor from LittleEndianOutputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
0fee97916b AK: Remove the fallbile constructor from BigEndianOutputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
261d62438f AK: Remove the fallible constructor from LittleEndianInputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
fa09152e23 AK: Remove the fallible constructor from BigEndianInputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
49b30d3013 AK: Remove InputBitStream and OutputBitStream 2023-01-21 00:45:33 +00:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Lenny Maiorani
ef4b98be52 AK: Add nodiscard attribute to BitStream functions 2022-07-04 05:53:56 +00:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
kleines Filmröllchen
caeb8fc691 LibCore: Introduce BigEndianInputBitStream
BigEndianInputBitStream is the Core::Stream API's bitwise input stream
for big endian input data. The functionality and bitwise read API is
almost unchanged from AK::BitStream, except that this bit stream only
supports big endian operations.

As the behavior for mixing big endian and little endian reads on
AK::BitStream is unknown (and untested), it was never done anyways. So
this was a good opportunity to split up big endian and little endian
reading.

Another API improvement from AK::BitStream is the ability to specify
the return type of the bit read function. Always needing to static_cast
the result of BitStream::read_bits_big_endian into the desired type is
adding a lot of avoidable noise to the users (primarily FlacLoader).
2022-01-22 01:13:42 +03:30
kleines Filmröllchen
9d4c50ca60 AK: Add big endian bit reading to InputBitStream
The existing InputBitStream methods only read in little endian, as this
is what the rest of the system requires. Two new methods allow the input
bitstream to read bits in big endian as well, while using the existing
state infrastructure.

Note that it can lead to issues if little endian and big endian reads
are used out of order without aligning to a byte boundary first.
2021-06-25 20:48:14 +04:30
Idan Horowitz
1c512a702a AK+Userland: Use idan.horowitz@serenityos.org for my copyright headers 2021-04-22 22:42:38 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Idan Horowitz
ea5f83616e LibCompress+AK: Dont short-circuit error handling propagation
In the case that both the stream and the wrapped substream had errors
to be handled only one of the two would be resolved due to boolean
short circuiting. this commit ensures both are handled irregardless
of one another.
2021-03-16 14:56:50 +01:00
Idan Horowitz
a955fd4156 LibCompress+AK: Propagate error handling to wrapped streams
This ensures that when a DeflateCompressor stream is cleared of any
errors its underlying wrapped streams (InputBitStream/InputMemoryStream)
will be cleared as well and wont fail a VERIFY on destruction.
2021-03-15 21:35:48 +01:00
Idan Horowitz
1b7b503bae AK: Add fast paths for aligned bit writes in BitOutputStream
If the bit write is aligned (or has been aligned during the write) we can
write in multiples of 32/16/8 bits for increased performance.
2021-03-13 23:50:07 +01:00
Idan Horowitz
0ddc0e45ae AK: Add OutputBitStream class
This will be used in the deflate compressor.
2021-03-13 20:07:25 +01:00
asynts
96edcbc27c AK: Lower the requirements for InputStream::eof and rename it.
Consider the following snippet:

    void foo(InputStream& stream) {
        if(!stream.eof()) {
            u8 byte;
            stream >> byte;
        }
    }

There is a very subtle bug in this snippet, for some input streams eof()
might return false even if no more data can be read. In this case an
error flag would be set on the stream.

Until now I've always ensured that this is not the case, but this made
the implementation of eof() unnecessarily complicated.
InputFileStream::eof had to keep a ByteBuffer around just to make this
possible. That meant a ton of unnecessary copies just to get a reliable
eof().

In most cases it isn't actually necessary to have a reliable eof()
implementation.

In most other cases a reliable eof() is avaliable anyways because in
some cases like InputMemoryStream it is very easy to implement.
2020-09-14 20:58:12 +02:00
Ben Wiederhake
0d3a8d5397 AK: Fix accidentally-recursive call in BitStream 2020-09-12 00:13:29 +02:00
asynts
6de63782c7 Streams: Consistent behaviour when reading from stream with error.
The streaming operator doesn't short-circuit, consider the following
snippet:

    void foo(InputStream& stream) {
        int a, b;
        stream >> a >> b;
    }

If the first read fails, the second is called regardless. It should be
well defined what happens in this case: nothing.
2020-09-06 12:54:45 +02:00
asynts
9ce4475907 Streams: Distinguish recoverable and fatal errors. 2020-09-01 17:25:26 +02:00
asynts
71cbf72e8a AK: Add InputBitStream class. 2020-08-26 21:07:53 +02:00