Commit graph

24 commits

Author SHA1 Message Date
Zaggy1024
24ae35086d LibGfx/LibVideo: Check for overreads only at end of a VPX range decode
Errors are now deferred until `finish_decode()` is finished, meaning
branches to return errors only need to occur at the end of a ranged
decode. If VPX_DEBUG is enabled, a debug message will be printed
immediately when an overread occurs.

Average decoding times for `Tests/LibGfx/test-inputs/4.webp` improve
by about 4.7% with this change, absolute decode times changing from
27.4ms±1.1ms down to 26.1ms±1.0ms.
2023-06-10 07:17:12 +02:00
Zaggy1024
873b0e9470 LibGfx/LibVideo: Read batches of multiple bytes in VPX BooleanDecoder
This does a few things:

- The decoder uses a 32- or 64-bit integer as a reservoir of the data
  being decoded, rather than one single byte as it was previously.
- `read_bool()` only refills the reservoir (value) when the size drops
  below one byte. Previously, it would read out a bit-sized range from
  the data to completely refill the 8-bit value, doing much more work
  than necessary for each individual read.
- VP9-specific code for reading the marker bit was moved to its own
  function in Context.h.
- A debug flag `VPX_DEBUG` was added to optionally enable checking of
  the final bits in a VPX ranged arithmetic decode and ensure that it
  contains all zeroes. These zeroes are a bitstream requirement for
  VP9, and are also present for all our lossy WebP test inputs
  currently. This can be useful to test whether all the data present in
  the range has been consumed.

A lot of the size of this diff comes from the removal of error handling
from all the range decoder reads in LibVideo/VP9 and LibGfx/WebP (VP8),
since it is now checked only at the end of the range.

In a benchmark decoding `Tests/LibGfx/test-inputs/4.webp`, decode times
are improved by about 22.8%, reducing average runtime from 35.5ms±1.1ms
down to 27.4±1.1ms.

This should cause no behavioral changes.
2023-06-10 07:17:12 +02:00
Nico Weber
2452cf6b55 WebP/Lossy: Allow negative values from segment adjustment too
The spec doesn't talk about this happening in the text, but
`dequant_init()` in 20.4 processes segment adjustment and quantization
index adjustment in the same variable `q` before clamping.
Since we had to adjust the latter step in the previous commit, do
it for the former step too.

I haven't seen this happen in the wild yet (and now, I hopefully
never will notice it if it happens).
2023-06-01 17:36:20 +02:00
Nico Weber
661b2d394d WebP/Lossy: Clamp negative quantization indices to zero
The spec doesn't talk about this happening in the text, but
`dequant_init()` in 20.4 stores `q` in an int and clamps that
to 0 later.
2023-06-01 17:36:20 +02:00
Nico Weber
287e2655cb WebP/Lossy: Add a missing clamp
The spec says that the AC dequantization factor for Y2 data should
be at least 8, so do that.

This only has a very small effect (only the first two AC table
entries are < 8 after multiplying with 155 / 100, so this would
have only a small effect on brightness), and this case is hit
exactly 0 times in all my test images.  But it's still good to match
the spec.
2023-06-01 16:23:46 +02:00
Nico Weber
24aa302e88 WebP/Lossy: Reduce size of MacroblockMetadata from 80 to 20 bytes
For a 1024x1024 image, saves about a quarter MB of memory use while
decoding (compared to the decompressed image data itself needing
4 MiB).  Not a huge win, but also very easy to do, so might as well.

No behavior change, no measurable performance impact.
2023-06-01 16:23:46 +02:00
Nico Weber
8a40b49b8b WebP/Lossy: Use 8-bit buffers for prediction and YUV data
This is safe because:

* prediction only computes averages, or explicitly clamps for
  TM_PRED / B_TM_PRED. Since the inputs are in [0, 255], so will the
  outputs.
* Addition of IDCT and prediction buffer is immediately clamped back
  to [0, 255]

No behavior change, and matches what both libwebp and the reference
implementation in rfc6386 do.
2023-05-31 22:38:36 +02:00
Nico Weber
ffae065593 WebP/Lossy: Clamp right after summing IDCT output, instead of later
https://datatracker.ietf.org/doc/html/rfc6386#section-14.5 says:

"""
The summing procedure is fairly straightforward, having only a couple
of details.  The prediction and residue buffers are both arrays of
16-bit signed integers.  Each individual (Y, U, and V pixel) result
is calculated first as a 32-bit sum of the prediction and residue,
and is then saturated to 8-bit unsigned range (using, say, the
clamp255 function defined above) before being stored as an 8-bit
unsigned pixel value.
"""

It's IMHO not 100% clear if the clamping is supposed to happen
immediately (so that it affects prediction inputs for the next
macroblock) or later.

But vp8_dixie_idct_add() on page 173 in
https://datatracker.ietf.org/doc/html/rfc6386#section-20.8 does:

    recon[0] = CLAMP_255(predict[0] + ((a1 + d1 + 4) >> 3));

So it does look like it should happen immediately.

(I'm a bit confused why the spec then says "The prediction and residue
buffers are both arrays of 16-bit signed integers", since the
prediction buffer can just be an u8 buffer now, without changing
behavior.
2023-05-31 22:38:36 +02:00
Nico Weber
cf934f9bfc WebP/Lossy: Move two enums closer to the struct that uses them
I accidentally inserted a bunch of code in between.

No behavior change.
2023-05-31 16:40:40 +02:00
Nico Weber
830a3a25dc WebP/Lossy: Add a missing clamp() in TM_PRED prediction
The spec has that clamp at the end of
https://datatracker.ietf.org/doc/html/rfc6386#section-12.2:

    The exact algorithm is as follows:
    [...]
               b[r][c] = clamp255(L[r]+ A[c] - P);

For the test images I'm looking at, it doesn't seem to make a
dramatic difference, but omitting it in `B_TM_PRED` did make
a dramatic difference, so add it. (Also, the spec demands it.)
2023-05-31 16:22:49 +02:00
Nico Weber
40e1ec6cf9 WebP/Lossy: Remove an unnecessary branch
`predicted_y_above` is initialized to a row of 127s, so we can just
read from it even in the first macroblock row.

No behavior change.
2023-05-31 15:28:41 +02:00
Nico Weber
a2d8de180c WebP/Lossy: Add support for images with more than one partition
Each secondary partition has an independent BooleanDecoder.
Their bitstreams interleave per macroblock row, that is the first
macroblock row is read from the first decoder, the second from the
second, ..., until it wraps around again.

All partitions share a single prediction state though: The second
macroblock row (which reads coefficients off the second decoder) is
predicted using the result of decoding the frist macroblock row (which
reads coefficients off the first decoder).

So if I understand things right, in theory the coefficient reading could
be parallelized, but prediction can't be. (IDCT can also be
parallelized, but that's true with just a single partition too.)

I created the test image by running

    examples/cwebp -low_memory -partitions 3 -o foo.webp \
        ~/src/serenity/Tests/LibGfx/test-inputs/4.webp

using a cwebp hacked up as described in #19149. Since creating
multi-partition lossy webps requires hacking up `cwebp`, they're likely
very rare in practice. (But maybe other programs using the libwebp API
create them.)

Fixes #19149.

With this, webp lossy support is complete (*) :^)

And with that, webp support is complete: Lossless, lossy, lossy with
alpha, animated lossless, animated lossy, animated lossy with alpha all
work.

(*: Loop filtering isn't implemented yet, which has a minor visual
effect on the output. But it's only visible when carefully comparing
a webp decoded without loop filtering to the same decoded with it.
But it's technically a part of the spec that's still missing.

The upsampling of UV in the YUV->RGB code is also low-quality. This
produces somewhat visible banding in practice in some images (e.g.
in the fire breather's face in 5.webp), so we should probably improve
that at some point. Our JPG decoder has the same issue.)
2023-05-31 14:07:15 +02:00
Nico Weber
f3beff0930 WebP/Lossy: Tweak some comments 2023-05-30 06:14:56 +02:00
Nico Weber
74b50c046b WebP/Lossy: Check that file contains enough data for first partition 2023-05-30 06:14:56 +02:00
Nico Weber
f8e4a0a268 WebP/Lossy: Implement prediction and inverse DCT
This could be a bit prettier, but it works :^)
2023-05-29 19:44:45 +02:00
Nico Weber
d1b5eec154 WebP/Lossy: Implement macroblock coefficient decoding
This basically adds the line

    u8 token = TRY(
        tree_decode(decoder, COEFFICIENT_TREE,
        header.coefficient_probabilities[plane][band][tricky],
        last_decoded_value == DCT_0 ? 2 : 0));

and calls it once for the 16 coefficients of a discrete cosine transform
that covers a 4x4 pixel subblock.

And then it does this 24 or 25 times per macroblock, for the 4x4 Y
subblocks and the 2x2 each U and V subblocks (plus an optional Y2 block
for some macroblocks).

It then adds a whole bunch of machinery to be able to compute `plane`,
`band`, and in particular `tricky` (which depends on if the
corresponding left or above subblock has non-zero coefficients).

(It also does dequantization, and does an inverse Walsh-Hadamard
transform when needed, to end up with complete DCT coefficients
in all cases.)
2023-05-29 10:41:53 -06:00
Nico Weber
d15ae9fa93 WebP/Lossy: It's 'macroblock', not 'metablock'
Somehow my brain decided to change the name of this concept.
Not sure why, the spec consistently uses 'macroblock'.

No behavior change.
2023-05-28 18:01:31 +02:00
Nico Weber
ea54c58930 WebP/Lossy: Variable naming fix for constants from last pull request 2023-05-27 15:25:00 -06:00
Nico Weber
bd5290dd45 WebP/Lossy: Add code to read macroblock metadata 2023-05-27 15:25:00 -06:00
Nico Weber
8defd55349 WebP/Lossy: Add code to read the frame header in the first partition 2023-05-27 08:31:03 -06:00
Nico Weber
bbc1f57d1e WebP/Lossy: Add a comment with a summary of the file format 2023-05-27 08:31:03 -06:00
Nico Weber
703bd4c8a3 WebP/Lossy: Validate show_frame and version when reading header 2023-05-24 16:09:40 +02:00
Nico Weber
dd2ca56ee4 LibGfx/WebP: Add two missing closing quotes for spec comments 2023-05-09 06:35:56 +02:00
Nico Weber
bc207fd0a0 LibGfx/WebP: Move lossy decoder to its own file
Pure code move (except of removing `static` on the two public functions
in the new header), not behavior change.

There isn't a lot of lossy decoder yet, but it'll make implementing it
more convenient.

No behavior change.
2023-05-09 06:35:56 +02:00