This patch brings few small QoL improvements:
- We don't need to read the Huffman stream before returning an error
due to a missing quantization table.
- We check the table presence only once per scan instead of once per
MCU.
- `dequantize()` is now infallible.
The former automatically adapts the prefix to binary and octal
output, and is what we already use in the majority of cases.
Patch generated by:
rg -l '0x\{' | xargs sed -i '' -e 's/0x{:/{:#/'
I ran it 4 times (until it stopped changing things) since each
invocation only converted one instance per line.
No behavior change.
JPEGStream::byte_offset() now returns an offset relative to the start
of the stream, instead of relative to the buffered part.
No behavior change except if JPEG_DEBUG is set.
JPEGs can store a `restart_interval`, which controls how many
minimum coded units (MCUs) apart the stream state resets.
This can be used for error correction, decoding parts of a jpeg
in parallel, etc.
We tried to use
u32 i = vcursor * context.mblock_meta.hpadded_count + hcursor;
i % (context.dc_restart_interval *
context.sampling_factors.vertical *
context.sampling_factors.horizontal) == 0
to check if we hit a multiple of an MCU.
`hcursor` is the horizontal offset into 8x8 blocks, vcursor the
vertical offset, and hpadded_count stores how many 8x8 blocks
we have per row, padded to a multiple of the sampling factor.
This isn't quite right if hcursor isn't divisible by both
the vertical and horizontal sampling factor. Tweak things so
that they work.
Also rename `i` to `number_of_mcus_decoded_so_far` since that
what it is, at least now.
For the test case, I converted an existing image to a ppm:
Build/lagom/bin/image -o out.ppm \
Tests/LibGfx/test-inputs/jpg/12-bit.jpg
Then I resized it to 102x77px in Photoshop and saved it again.
Then I turned it into a jpeg like so:
path/to/cjpeg \
-outfile Tests/LibGfx/test-inputs/jpg/odd-restart.jpg \
-sample 2x2,1x1,1x1 -quality 5 -restart 3B out.ppm
The trick here is to:
a) Pick a size that's not divisible by the data size width (8),
and that when rounded to a block size (13) still isn't divisible
by the subsample factor -- done by picking a width of 102.
b) Pick a huffman table that doesn't happen to contain the bit
pattern for a restart marker, so that reading a restart marker
from the bitstream as data causes a failure (-quality 5 happens
to do this)
c) Pick a restart interval where we fail to skip it if our calculation
is off (-restart 3B)
Together with #22987, fixes#22780.
Non-interleaved files always have an MCU of one data unit.
(A "data unit" is an 8x8 tile of pixels, and an "MCU" is a
"minium coded unit", e.g. 2x2 data units for luminance and
1 data unit each for Cr and Cb for a YCrCb image with
4:2:0 subsampling.)
For the test case, I converted an existing image to a ppm:
Build/lagom/bin/image -o out.ppm \
Tests/LibGfx/test-inputs/jpg/12-bit.jpg
Then I converted it to grayscale and saved it as a pgm in Photoshop.
Then I turned it into a weird jpeg like so:
path/to/cjpeg \
-outfile Tests/LibGfx/test-inputs/jpg/grayscale_mcu.jpg \
-sample 2x2 -restart 3 out.pgm
Makes 3 of the 5 jpegs failing to decode at #22780 go.
That's all this function reads from Component.
Also rename from validate_luma_and_modify_context() to
validate_sampling_factors_and_modify_context().
No behavior change.
When decoding a CMYK image and asked for a normal `frame()`, the decoder
would convert the CMYK bitmap into an RGB bitmap. Calling `cmyk_frame()`
after that point will provoke a null-dereference.
SamplingFactors already has default initializers for its field,
so no need to have an explicit one for the first of the two fields.
No behavior change.
We now allow all subsampling factors where the subsampling factors
of follow-on components evenly decode the ones of the first component.
In practice, this allows YCCK 2111, CMYK 2112, and CMYK 2111.
Previously, we handled sampling factors as part of ycbcr_to_rgb().
That meant it worked ok for code paths that used YCbCr ("normal"
jpegs, and the YCC part of YCCK jpegs), but it didn't work for
example for the K channel in YCCK jpegs, nor for CMYK.
By making this a separate pass, it should now work for all cases.
It also makes it easier to support more subsampling arrangements
in the future, and to use something better than nearest neighbor
for upsampling subsampled blocks.
frame() still returns a regular RGB Bitmap (now lazily converted
from internal CMYK data), but JPEGImageDecoderPlugin now also
implements cmyk_frame().
The decoder assumes that k's sampling factor matches y's at the moment.
Better to error out than to silently render something broken.
For ycck, covered by ycck-2111.jpg in the tests.
We currently assume that the K (black) channel uses the same sampling
as the Y channel already, so this already works as long as we don't
error out on it.
This is a hack: Ideally we'd have a CMYK Bitmap pixel format,
and we'd convert to rgb at blit time. Then we could also apply color
profiles (which for CMYK images are CMYK-based).
Also, the colors for our CMYK->RGB conversion are off for PDFs,
and we have distinct codepaths for this in Gfx::Color (for paths)
and JPEGs. So when we fix that, we'll have to fix it in two places.
But this doesn't require a lot of code and it's a huge visual
progression, so let's go with it for now.
It is unlikely this is needed anymore, and as pointed out things should
now safely return OOM if the bitmap is too large to allocate.
Also, no recently added decoders respected this limit anyway.
Fixes#20872
The adobe specification doesn't even consider JPEG images with a single
component. So let's not consider the content of the App14 segment for
grayscale images.
The ideal size is the size the user will display the image. Raster
formats should ignore this parameter, but vector formats can use
it to generate a bitmap of the ideal size.
Decoding progressive JPEGs involves a much more complicated logic than
sequential JPEGs. Thanks to template specialization, this patch allow us
to skip the additional cost of progressive images when it's not needed.
It gives a nice 10% improvements on sequential JPEGs :^)
Some of these functions can be called millions of times even for images
of moderate size. Inlining these functions really helps the compiler and
gives performance improvements up to 10%.
Inside each scan, raw data is read with the following rules:
- Each `0x00` that is preceded by `0xFF` should be discarded
- If multiple `0xFF` are following, only consider one.
That, plus the fact that we don't know the size of the scan beforehand
made us put a prepared stream in a vector for an easy, later on, usage.
This patch remove this duplication by realizing these operations in a
stream-friendly way.
This class is similar to some extent to an implementation of `Stream`,
but specialized for JPEG reading.
Technically, it is composed of a `Vector` to bufferize the input and
provides a simple interface to read bytes.
This patch aims for a future better and specialized control over the
input data. The encapsulation already allows us to get rid of the last
`seek` in the decoder, thus the ability to use the decoder with a raw
`Stream`.
As it provides faster `read` routines, this patch already reduces time
spend on reading.