0ct0pu5/ladybird

Author	SHA1	Message	Date
Nico Weber	7dd5457b8f	LibGfx/JBIG2: Add support for refinement coding template 1 This is used when refining a symbol in 0000337.pdf.	2024-03-25 13:16:02 -04:00
Nico Weber	ef9bfce0e7	LibGfx/JBIG2: Add support for SDREFAGG=1 symbol segments ...but only as long as REFAGGNINST == 1. That's enough for 0000337.pdf. Except that it also needs GRTEMPLATE=1 support in the generic refinement region decoding procedure, so no behaivor change yet.	2024-03-25 13:16:02 -04:00
Nico Weber	3fa2ecdd65	LibGfx/JBIG2: Extract read_id() into a class We'll need this for refinement/aggregate coding of symbols.	2024-03-25 13:16:02 -04:00
Nico Weber	68d47cb84a	LibGfx/JBIG2: Implement support for symbols segments with input symbols Needed for 0000337.pdf. It now fails complaining about missing SDREFAGG support.	2024-03-25 13:16:02 -04:00
Nico Weber	59e6a10f30	LibGfx/JBIG2: Initialize POD members of refinement region input struct I missed putting this in #23696 while juggling local branches. No behavior change.	2024-03-25 12:07:18 -04:00
Nico Weber	8e9157d6ce	LibGfx/JBIG2: Implement decode_end_of_stripe() a bit This is enough to be able to decode 0000857.pdf p1-4 and 0000372.pdf p11.	2024-03-25 14:08:40 +01:00
Nico Weber	c4a45bb521	LibGfx/JBIG2: Make compute_context() a function pointer ...instead of a lambda that checks the template on every call. Doesn't make a performance difference locally, but seems maybe nicer? No behavior change.	2024-03-25 14:08:40 +01:00
Nico Weber	828c640087	LibGfx/JBIG2: Make get_pixel static constexpr ...so it doesn't need to be captured.	2024-03-25 14:08:40 +01:00
Nico Weber	b45a4508c7	LibGfx/JBIG2: Implement support for context templates 1, 2, and 3 Template 2 is needed by some symbols in 0000372.pdf page 11 and 0000857.pdf pages 1-4. Implement the others too while here. (The mentioned pages in those two PDFs also use the "end of stripe" segment, so they still don't render yet. We still don't support EXTTEMPLATE.	2024-03-25 14:08:40 +01:00
Nico Weber	7035c2a2ff	LibGfx/JBIG2: Add some debug logging to decode_page_information()	2024-03-25 14:08:40 +01:00
Nico Weber	d2998c1f5e	LibGfx/JBIG2: Implement generic_refinement_region_decoding_procedure() With this, we can decode all pages of 0000425.pdf, 0000215.pdf, 0000882.pdf, and 0000057.pdf.	2024-03-25 08:15:36 +01:00
Nico Weber	0d2e91b4ea	LibGfx/JBIG2: Reject things in refinement decoding These aren't hit for my 1000 page PDF test set.	2024-03-25 08:15:36 +01:00
Nico Weber	562d8ed619	LibGfx/JBIG2: Stub out generic_refinement_region_decoding_procedure() ...and make text_region_decoding_procedure() call it. generic_refinement_region_decoding_procedure() still just returns "unimplemented", so no behavior change yet.	2024-03-25 08:15:36 +01:00
Nico Weber	c4c48c1d5f	LibGfx/JBIG2: Sketch out text segment refinement coding a bit	2024-03-25 08:15:36 +01:00
Nico Weber	9f327833c0	LibGfx/JBIG2: Read refinement adaptive template pixels for text segments Text segments using refinement are still rejected later, by text_region_decoding_procedure(). But we deserialize the input data now, and the error when this feature is used is now slightly different.	2024-03-25 08:15:36 +01:00
Nico Weber	ced21d8419	LibGfx/JBIG2: Call decode_immediate_text_region for lossless text region It seems to do the right thing already, and nothing in the spec says not to do this as far as I can tell. With this, we can finally decode the test input from #23659. See `f391c7822d` for a similar change for generic regions and lossless generic regions.	2024-03-23 17:30:15 -04:00
Nico Weber	b15e1d2b2a	LibGfx/JBIG2: Implement initial support for text segments Text segments conceptually store (x,y,id) triples. (x,y) are a coordinate, and id refers to an id from a symbol segment. A text segment has the effect of drawing some of the bitmaps stored in a symbol segment to the output bitmap. For example, the symbol segment might contain a small bitmap that happens to look like the letter 'A', and the text segment might draw that everywhere a scanned page has an 'A'. (The JBIG2 format only treats it as an abstract bitmap. It doesn't know that this small bitmap is an 'A'.) This is missing support for many things: * Huffman-coded input (not used in practice) * Symbol refinement * Transposed symbols * Colors (not used in practice) Still, we now have basic symbol/text segment support. This is enough to decode the downloadable PDF here: https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ It doesn't lead to any progression on my 1000 file test PDF set. The 7 files in there that use JBIG2 with symbol and text segments now fail to load for other reasons (4 need symbol refinement for text segments, one needs end-of-stripe segment support, one needs support for symbol segments referring to other segments). (And possibly, many other PDFs from Google Books, but that's the only one I've tried so far.)	2024-03-23 17:30:15 -04:00
Nico Weber	3454970903	LibGfx/JBIG2: Extract composite_bitbuffer() and add some features This extracts the bitbuffer combining code we had into a new function composite_bitbuffer() and adds the following features: * Real support for combination operators (which also lets us allow black as background color again, even if that's never used in practice) * Clipping support (not used here yet, but will be needed elsewhere soon) We're going to need this for text segment handling. No behavior change.	2024-03-23 17:30:15 -04:00
Nico Weber	754e1b46fc	LibGfx/JBIG2: Implement basic symbol segment processing A symbol segment defines a bunch of small bitmaps and associates them with numeric IDs. This only implements reading symbols encoded with the arithmetic coder. It does not support huffman coding. (In practice, everything seems to use arithmetic coding.) Support for refinement or aggregate coding isn't implemented yet. Support for retaining bitmap coding contexts isn't implemented yet. Support for symbol segments referring to other symbol segments isn't implemented yet. But all produce diagnostics if encountered, so we won't forget about them. (I haven't seen either being used in the wild.) No visible behavior change yet, but with JBIG2_DEBUG turned on, it produces all kinds of debug output.	2024-03-23 17:30:15 -04:00
Nico Weber	93fcb529cf	LibGfx/JBIG2: Move SegmentData down a bit Symbol segments will store decoded symbols, and for that SegmentData needs to come after BitBuffer. No behavior change.	2024-03-23 17:30:15 -04:00
Nico Weber	2099ca48a1	LibGfx/JBIG2: Pass in decoder and contexts to generic region decoder The symbol segment decoding procedure will read generic regions that aren't at a byte boundary, and that share contexts across several regions. No behavior change.	2024-03-23 17:30:15 -04:00
Nico Weber	376b1a2309	LibGfx/JBIG2: Have just one CombinationOperator enum class We already had two, and we would need another one for text segments. No behavior change.	2024-03-23 17:30:15 -04:00
Nico Weber	c06110da87	LibGfx/JBIG2: Make AdaptiveTemplatePixel toplevel We're going to need it for symbol segment decoding too. No behavior change.	2024-03-23 17:30:15 -04:00
Nico Weber	8e82c2b932	LibGfx/JBIG2: Add arithmetic integer decoder The existing ArithmeticEncoder (from Annex E) reads one bit at a time. ArithmeticIntegerDecoder (from Annex A) builds on top of that to read integer values. This will be used by both the symbol segment and the text segment readers. (This does not yet implement the IAID decoding procedure in A.3. We only need that one in the text segment decoder at the moment, and it's pretty small, so I'll put it inline there for now.) Not used yet, so no behavior change yet.	2024-03-23 17:30:15 -04:00
Nico Weber	c99506da7d	LibGfx/JBIG2: Initialize POD members And use Array<> instead of C-style arrays.	2024-03-23 17:30:15 -04:00
Nico Weber	7650e657aa	LibGfx/JBIG2: Implement support for TPGDON	2024-03-17 17:38:30 +01:00
Nico Weber	f391c7822d	LibGfx/JBIG2: Call decode_immediate_generic_region for lossless regions It seems to do the right thing already, and nothing in the spec says not to do this as far as I can tell. With this, we can finally decode Tests/LibGfx/test-inputs/jbig2/bitmap.jbig2 and add a test for decoding simple arithmetic-coded images.	2024-03-16 09:21:42 -04:00
Nico Weber	6788a82ec5	LibGfx/JBIG2: Implement generic_region_decoding_procedure() happy path This errors out on many special cases. None of those seem to be hit in practice (with the exception of TPGDON, which is used in a handful PDFs. I have an implementation of that locally, but I'll put that in a separate PR. The code for it is straightforward, but adding a test for it is a bit involved.) With this, we can decode about half of the JBIG2 images in my PDF test dataset.	2024-03-16 09:21:42 -04:00
Nico Weber	b0c73d1652	LibGfx/JBIG2: Reject unimplemented combination operators In practice, everything uses white backgrounds and operators `or` or `xor` to turn them black, at least for the simple images we're about to be able to decode. To make sure we don't forget implementing this for real once needed, reject other ops, and also reject black backgrounds (because 1 \| 0 is 1, not 0 like our overwrite implementation will produce). This means we have to remove a test, but since this scenario doesn't seem to happen in practice, that seems ok.	2024-03-16 09:21:42 -04:00
Nico Weber	5dc9ead1c5	LibGfx/JBIG2: Expand a comment	2024-03-16 09:21:42 -04:00
Nico Weber	21c54839e6	LibGfx/JBIG2: Add two dbgln_if()s	2024-03-16 09:21:42 -04:00
Nico Weber	b8f80501ec	LibGfx/JBIG2: Pass Context to get_next_bit() instead of to initialize() The context can vary for every bit we read. This does not affect the one use in the test which reuses the same context for all bits, but it is necessary for future changes.	2024-03-16 09:21:42 -04:00
Nico Weber	df9dd8ec69	LibGfx/JBIG2: Add arithmetic coding decoder I think the context normally changes for every bit. But this here is enough to correctly decode the test bitstream in Annex H.2 in the spec, which seems like a good checkpoint. The internals of the decoder use spec naming, to make the code look virtually identical to what's in the spec. (Even so, I managed to put in several typos that took a while to track down.)	2024-03-14 18:18:15 -06:00
Nico Weber	98729c97f4	LibGfx/JBIG2: Simplify and restrict adaptive template pixel reading EXTTEMPLATE=1 was added later and doesn't seem to be used much in practice -- it doesn't appear in no simple generic regions in any PDF I tested so far at least. Since the spec contradicts itself on what to do with these as far as I can tell, error out on them for now and then add support once we find actual files using this, so that we can check our implementation actually works. Deduplicate the data reading for the different cases, and zero-initialize all adaptive template pixels to zero to make that possible. Other than prohibiting EXTTEMPLATE=1, no behavior change.	2024-03-14 10:57:57 -04:00
Nico Weber	596b06333f	LibGfx/JBIG2: Add a dbgln_if(JBIG2_DEBUG) for non-MMR generic regions	2024-03-14 10:57:57 -04:00
Nico Weber	7740aeca29	LibGfx/JBIG2: Fix size bound in scan_for_immediate_generic_region_size() The memmem() call passes `data.size() - 19 - sizeof(u32)` for big_len, (18 prefix bytes skipped, the flag byte, and the trailing u32), so the buffer needs to be at least that large. Should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67332	2024-03-13 22:01:06 -06:00
Nico Weber	bc42144642	LibGfx/JBIG2: Start implementing the generic region decoding procedure If the "MMR" bit is set, the generic region decoding procedure just uses ITU-T T.6 (2D CCITT), which we already have an implementation of. In practice, this is used almost never in .jbig2 files and in none of the PDFs I have. The two files that do use MMR are: 1.) JBIG2_ConformanceData-A20180829/F01_200_TT9.jb2 2) 042_3.jb2 from the ghostpdl tests The first uses an immediate _lossless_ generic region, which we don't implement yet (but I think it should just forward to the normal immediate generic region code? Not in this commit, though). The second uses a regular immediate generic region, and we can decode it now: Build/lagom/bin/image -o out.png \ path/to/ghostpdl/tests/jbig2/042_3.jb2	2024-03-11 16:48:57 +01:00
Nico Weber	e0af3ae8d9	LibGfx/JBIG2: Implement decode_end_of_file()	2024-03-11 16:48:57 +01:00
Nico Weber	323cacc593	LibGfx/JBIG2: Implement decode_end_of_page()	2024-03-11 16:48:57 +01:00
Nico Weber	bdbc21c52d	LibGfx/JBIG2: Implement conversion to Gfx::Bitmap and ByteBuffer With this, `image` can convert any jbig2 file, as long as it's black (or white), and LibPDF can draw jbig2 files (again, as long as they only contain a single color stored in just a PageInformation segment).	2024-03-10 10:10:55 -04:00
Nico Weber	54982857bd	LibGfx/JBIG2: Implement decode_page_information() Also make scan_for_page_size() not early return, so that it has the same behavior as the main decoding look. (Multiple page information segments for a single page are likely invalid and don't happen in practice, so this is mostly an academic change.) Add a BitBuffer class to store the bit image data, and introduce a Page struct for storing data associated with a page. We currently only handle a single page, but a) this makes it easier to decode multiple pages in the future if we want b) it makes the code easier to understand.	2024-03-10 10:10:55 -04:00
Nico Weber	4b01f2f158	LibGfx/JBIG2: Implement decode_extension() Only logs the data to dbgln(). All jb2 files in ghostpdl/tests start with this segment.	2024-03-10 10:10:55 -04:00
Nico Weber	9cd0c5658e	LibGfx/JBIG2: Reject files with delayed height information for now 7.4.8.2 Page bitmap height: "In some cases, this value may not be known at the time that the page information segment is written. In this case, this field must contain 0xFFFFFFFF, and the actual page height may be communicated later, once it is known."	2024-03-10 10:10:55 -04:00
Nico Weber	f592a2ac72	LibPDF/JBIG2: Print a warning on files with more than one page	2024-03-10 10:10:55 -04:00
Nico Weber	2caf603836	LibGfx/JBIG2: Add scaffolding for interpreting segment data	2024-03-10 10:10:55 -04:00
Nico Weber	af20ebe4a0	LibGfx/JBIG2: Scan for page size of page "1" Sounds like the spec guarantees that that's the number of the first page. (In practice, all but one of all jbig2 files I've found contain just page 1. PDFs almost always contain just page 1, and very rarely a page 0 for globally shared parameters.)	2024-03-10 10:10:55 -04:00
Nico Weber	8f4930f2df	LibGfx/JBIG2: Scan for the first PageInformation segment and decode it This allows `file` to correctly print the dimensions of a .jbig2 file, and it allows us to write a test that covers much of all the code written so far.	2024-03-09 16:01:22 +01:00
Nico Weber	1eaaa8c3e9	LibPDF+LibGfx: Support JBIG2s with /JBIG2Globals set Several ramifications: * /JBIG2Globals is an indirect reference, which means we now need a Document for unfiltering. (Technically, other decode parameters can also be indirect objects and we should use the Document to resolve() those too, but in practice it only seems to be needed for /JBIG2Globals.) * Since /JBIG2Globals are so rare, we just parse once for each image that use them, and decode_embedded() now receives a Vector<ReadonlyBytes> with all sections of sequences of segments. * Internally, decode_segment_headers() is now called several times for embedded JBIG2s with multiple such sections (e.g. PDFs with /JBIG2Globals). * That means `data` is now no longer part of JBIG2LoadingContext and things get slightly reshuffled due to this. This completes the LibPDF part of JBIG2 support. Once LibGfx implements actual decoding of JBIG2s, things should start to Just Work in PDFs.	2024-03-09 16:01:22 +01:00
Nico Weber	09ca66cb8b	LibGfx/JBIG2: Scan for end of immediate generic regions of unknown size Found e.g. in 0000033.pdf (both pages).	2024-03-09 16:01:22 +01:00
Nico Weber	379ef45688	LibGfx/JBIG2: Store location of segment data bodies They're in different places for Sequential/Embedded (right after the header) and RandomAccess (which has all headers first, followed by all data bits next). We don't do anything with the data yet, but now everything's in place to actually process segment data.	2024-03-09 16:01:22 +01:00

1 2

55 commits