A tile is basically a strip with a user-defined width. With that in
mind, adding support for them is quite straightforward. As a lot the
common code was named after 'strips', to avoid future confusion I
renamed everything that interact with either strips or tiles to a
global term: 'segment'.
Note that tiled images are supposed to always have a 'TileOffsets' tag
instead of 'StripOffset'. However, this doesn't seem to be enforced by
encoders, so we support having either of them indifferently.
The test case was generated with the following Python script:
import pyvips
img = pyvips.Image.new_from_file('deflate.tiff')
img.write_to_file('tiled.tiff',
compression=pyvips.ForeignTiffCompression.DEFLATE,
tile=True, tile_width=64, tile_height=64)
JPEGs can store a `restart_interval`, which controls how many
minimum coded units (MCUs) apart the stream state resets.
This can be used for error correction, decoding parts of a jpeg
in parallel, etc.
We tried to use
u32 i = vcursor * context.mblock_meta.hpadded_count + hcursor;
i % (context.dc_restart_interval *
context.sampling_factors.vertical *
context.sampling_factors.horizontal) == 0
to check if we hit a multiple of an MCU.
`hcursor` is the horizontal offset into 8x8 blocks, vcursor the
vertical offset, and hpadded_count stores how many 8x8 blocks
we have per row, padded to a multiple of the sampling factor.
This isn't quite right if hcursor isn't divisible by both
the vertical and horizontal sampling factor. Tweak things so
that they work.
Also rename `i` to `number_of_mcus_decoded_so_far` since that
what it is, at least now.
For the test case, I converted an existing image to a ppm:
Build/lagom/bin/image -o out.ppm \
Tests/LibGfx/test-inputs/jpg/12-bit.jpg
Then I resized it to 102x77px in Photoshop and saved it again.
Then I turned it into a jpeg like so:
path/to/cjpeg \
-outfile Tests/LibGfx/test-inputs/jpg/odd-restart.jpg \
-sample 2x2,1x1,1x1 -quality 5 -restart 3B out.ppm
The trick here is to:
a) Pick a size that's not divisible by the data size width (8),
and that when rounded to a block size (13) still isn't divisible
by the subsample factor -- done by picking a width of 102.
b) Pick a huffman table that doesn't happen to contain the bit
pattern for a restart marker, so that reading a restart marker
from the bitstream as data causes a failure (-quality 5 happens
to do this)
c) Pick a restart interval where we fail to skip it if our calculation
is off (-restart 3B)
Together with #22987, fixes#22780.
Non-interleaved files always have an MCU of one data unit.
(A "data unit" is an 8x8 tile of pixels, and an "MCU" is a
"minium coded unit", e.g. 2x2 data units for luminance and
1 data unit each for Cr and Cb for a YCrCb image with
4:2:0 subsampling.)
For the test case, I converted an existing image to a ppm:
Build/lagom/bin/image -o out.ppm \
Tests/LibGfx/test-inputs/jpg/12-bit.jpg
Then I converted it to grayscale and saved it as a pgm in Photoshop.
Then I turned it into a weird jpeg like so:
path/to/cjpeg \
-outfile Tests/LibGfx/test-inputs/jpg/grayscale_mcu.jpg \
-sample 2x2 -restart 3 out.pgm
Makes 3 of the 5 jpegs failing to decode at #22780 go.
When present, the alpha channel is also affected by the horizontal
differencing predictor.
The test case was generated with GIMP with the following steps:
- Open an RGB image
- Add a transparency layer
- Export as TIFF with the LZW compression scheme
This tag is required by the specification, but some encoders (at least
Krita) don't write it for images with a single strip.
The test file was generated by opening deflate.tiff in Krita and saving
it with the DEFLATE compression.
Type 2 <=> One-dimensional Group3, customized for TIFF
Type 3 <=> Two-dimensional Group3, uses the original 1D internally
Type 4 <=> Two-dimensional Group4
So let's clarify that this is not Group3 1D but the TIFF variant, which
is called `CCITTRLE` in libtiff. So let's stick with this name to avoid
confusion.
Images with a display mask ("stencil" as it's called in DPaint) add
an extra bitplane which acts as a mask. For now, at least skip it
properly. Later we should render masked pixels as transparent, but
this requires some refactoring.
We now allow all subsampling factors where the subsampling factors
of follow-on components evenly decode the ones of the first component.
In practice, this allows YCCK 2111, CMYK 2112, and CMYK 2111.
Some apps seem to generate malformed images that are accepted
by most readers. We now only throw if malformed data would lead to
a write outside the chunky buffer.
When using the BMP encoding, ICO images are expected to contain a 1-bit
mask for transparency. Regardless an alpha channel is already included
in the image, the mask is always required. As stated here[1], the
mask is used to provide shadow around the image.
Unfortunately, it seems that some encoder do not include that second
transparency mask. So let's read that mask only if some data is still
remaining after decoding the image.
The test case has been generated by truncating the 64 last bytes
(originally dedicated to the mask) from the `serenity.ico` file and
changing the declared size of the image in the ICO header. The size
value is stored at the offset 0x0E in the file and I changed the value
from 0x0468 to 0x0428.
[1]: https://devblogs.microsoft.com/oldnewthing/20101021-00/?p=12483
This fixes an issue where GIF images without a global color table would
have the first segment incorrectly interpreted as color table data.
Makes many more screenshots appear on https://virtuallyfun.com/ :^)
TIFF files are made in a way that make them easily extendable and over
the years people have made sure to exploit that. In other words, it's
easy to find images with non-standard tags. Instead of returning an
error for that, let's skip them.
Note that we need to make sure to realign the reading head in the file.
The test case was originally a 10x10 checkerboard image with required
tags, and also the `DocumentName` tag. Then, I modified this tag in a
hexadecimal editor and replaced its id with 30 000 (0x3075 as a LE u16)
and the type with the same value as well. This is AFAIK, never used as
a custom TIFF tag, so this should remain an invalid tag id and type.
We currently assume that the K (black) channel uses the same sampling
as the Y channel already, so this already works as long as we don't
error out on it.
TIFF images with the PhotometricInterpretation tag set to RGBPalette are
based on indexed colors instead of explicitly describing the color for
each pixel. Let's add support for them.
The test case was generated with GIMP using the Indexed image mode after
adding an alpha layer. Not all decoders are able to open this image, but
GIMP can.
UnassociatedAlpha is the one used by GIMP when generating TIFF images
with transparency. Support is added for Grayscale and RGB images as it's
the two that we support right now but managing transparency should be
really straightforward for other types as well.
This compression (tag Compression=2) is not very popular on its own, but
a base to implement CCITT3 2D and CCITT4 compressions.
As the format has no real benefits, it is quite hard to find an app that
accepts tho encode that for you. So I used the following program that
calls `libtiff` directly:
```cpp
#include <vector>
#include <cstdlib>
#include <iostream>
#include <tiffio.h>
// An array containing 0 and 1 of length width * height.
extern std::vector<uint8_t> array;
int main() {
// From: https://stackoverflow.com/a/34257789
TIFF *image = TIFFOpen("input.tif", "w");
int const width = 400;
int const height = 300;
TIFFSetField(image, TIFFTAG_IMAGEWIDTH, width);
TIFFSetField(image, TIFFTAG_IMAGELENGTH, height);
TIFFSetField(image, TIFFTAG_PHOTOMETRIC, 0);
TIFFSetField(image, TIFFTAG_COMPRESSION, COMPRESSION_CCITTRLE);
TIFFSetField(image, TIFFTAG_BITSPERSAMPLE, 1);
TIFFSetField(image, TIFFTAG_SAMPLESPERPIXEL, 1);
TIFFSetField(image, TIFFTAG_ROWSPERSTRIP, 1);
std::vector<uint8_t> scan_line(width / 8 + 8, 0);
int count = 0;
for (int i = 0; i < height; i++) {
std::fill(scan_line.begin(), scan_line.end(), 0);
for (int x = 0; x < width; ++x) {
uint8_t eight_pixels = scan_line.at(x / 8);
eight_pixels = eight_pixels << 1;
eight_pixels |= !array.at(i * width + x);
scan_line.at(x / 8) = eight_pixels;
}
int bytes = int(width / 8.0 + 0.5);
if (TIFFWriteScanline(image, scan_line.data(), i, bytes) != 1)
std::cerr << "Something went wrong\n";
}
TIFFClose(image);
}
```
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This change limits the amount of memory that is initially allocated for
the color table. This prevents an OOM condition if the file contains an
incorrect color table size.
Previously, the regression tests for OSS-Fuzz issues 62033 and 63296
used test case files directly from OSS-Fuzz. These files are invalid
in multiple ways because they have been generated by a fuzzer. This
commit replaces these files with ones that only expose the issue being
tested.