0ct0pu5/ladybird

Author	SHA1	Message	Date
Nico Weber	df9dd8ec69	LibGfx/JBIG2: Add arithmetic coding decoder I think the context normally changes for every bit. But this here is enough to correctly decode the test bitstream in Annex H.2 in the spec, which seems like a good checkpoint. The internals of the decoder use spec naming, to make the code look virtually identical to what's in the spec. (Even so, I managed to put in several typos that took a while to track down.)	2024-03-14 18:18:15 -06:00
Nico Weber	98729c97f4	LibGfx/JBIG2: Simplify and restrict adaptive template pixel reading EXTTEMPLATE=1 was added later and doesn't seem to be used much in practice -- it doesn't appear in no simple generic regions in any PDF I tested so far at least. Since the spec contradicts itself on what to do with these as far as I can tell, error out on them for now and then add support once we find actual files using this, so that we can check our implementation actually works. Deduplicate the data reading for the different cases, and zero-initialize all adaptive template pixels to zero to make that possible. Other than prohibiting EXTTEMPLATE=1, no behavior change.	2024-03-14 10:57:57 -04:00
Nico Weber	596b06333f	LibGfx/JBIG2: Add a dbgln_if(JBIG2_DEBUG) for non-MMR generic regions	2024-03-14 10:57:57 -04:00
Nico Weber	7740aeca29	LibGfx/JBIG2: Fix size bound in scan_for_immediate_generic_region_size() The memmem() call passes `data.size() - 19 - sizeof(u32)` for big_len, (18 prefix bytes skipped, the flag byte, and the trailing u32), so the buffer needs to be at least that large. Should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67332	2024-03-13 22:01:06 -06:00
Nico Weber	bc42144642	LibGfx/JBIG2: Start implementing the generic region decoding procedure If the "MMR" bit is set, the generic region decoding procedure just uses ITU-T T.6 (2D CCITT), which we already have an implementation of. In practice, this is used almost never in .jbig2 files and in none of the PDFs I have. The two files that do use MMR are: 1.) JBIG2_ConformanceData-A20180829/F01_200_TT9.jb2 2) 042_3.jb2 from the ghostpdl tests The first uses an immediate _lossless_ generic region, which we don't implement yet (but I think it should just forward to the normal immediate generic region code? Not in this commit, though). The second uses a regular immediate generic region, and we can decode it now: Build/lagom/bin/image -o out.png \ path/to/ghostpdl/tests/jbig2/042_3.jb2	2024-03-11 16:48:57 +01:00
Nico Weber	e0af3ae8d9	LibGfx/JBIG2: Implement decode_end_of_file()	2024-03-11 16:48:57 +01:00
Nico Weber	323cacc593	LibGfx/JBIG2: Implement decode_end_of_page()	2024-03-11 16:48:57 +01:00
Nico Weber	bdbc21c52d	LibGfx/JBIG2: Implement conversion to Gfx::Bitmap and ByteBuffer With this, `image` can convert any jbig2 file, as long as it's black (or white), and LibPDF can draw jbig2 files (again, as long as they only contain a single color stored in just a PageInformation segment).	2024-03-10 10:10:55 -04:00
Nico Weber	54982857bd	LibGfx/JBIG2: Implement decode_page_information() Also make scan_for_page_size() not early return, so that it has the same behavior as the main decoding look. (Multiple page information segments for a single page are likely invalid and don't happen in practice, so this is mostly an academic change.) Add a BitBuffer class to store the bit image data, and introduce a Page struct for storing data associated with a page. We currently only handle a single page, but a) this makes it easier to decode multiple pages in the future if we want b) it makes the code easier to understand.	2024-03-10 10:10:55 -04:00
Nico Weber	4b01f2f158	LibGfx/JBIG2: Implement decode_extension() Only logs the data to dbgln(). All jb2 files in ghostpdl/tests start with this segment.	2024-03-10 10:10:55 -04:00
Nico Weber	9cd0c5658e	LibGfx/JBIG2: Reject files with delayed height information for now 7.4.8.2 Page bitmap height: "In some cases, this value may not be known at the time that the page information segment is written. In this case, this field must contain 0xFFFFFFFF, and the actual page height may be communicated later, once it is known."	2024-03-10 10:10:55 -04:00
Nico Weber	f592a2ac72	LibPDF/JBIG2: Print a warning on files with more than one page	2024-03-10 10:10:55 -04:00
Nico Weber	2caf603836	LibGfx/JBIG2: Add scaffolding for interpreting segment data	2024-03-10 10:10:55 -04:00
Nico Weber	af20ebe4a0	LibGfx/JBIG2: Scan for page size of page "1" Sounds like the spec guarantees that that's the number of the first page. (In practice, all but one of all jbig2 files I've found contain just page 1. PDFs almost always contain just page 1, and very rarely a page 0 for globally shared parameters.)	2024-03-10 10:10:55 -04:00
Nico Weber	8f4930f2df	LibGfx/JBIG2: Scan for the first PageInformation segment and decode it This allows `file` to correctly print the dimensions of a .jbig2 file, and it allows us to write a test that covers much of all the code written so far.	2024-03-09 16:01:22 +01:00
Nico Weber	1eaaa8c3e9	LibPDF+LibGfx: Support JBIG2s with /JBIG2Globals set Several ramifications: * /JBIG2Globals is an indirect reference, which means we now need a Document for unfiltering. (Technically, other decode parameters can also be indirect objects and we should use the Document to resolve() those too, but in practice it only seems to be needed for /JBIG2Globals.) * Since /JBIG2Globals are so rare, we just parse once for each image that use them, and decode_embedded() now receives a Vector<ReadonlyBytes> with all sections of sequences of segments. * Internally, decode_segment_headers() is now called several times for embedded JBIG2s with multiple such sections (e.g. PDFs with /JBIG2Globals). * That means `data` is now no longer part of JBIG2LoadingContext and things get slightly reshuffled due to this. This completes the LibPDF part of JBIG2 support. Once LibGfx implements actual decoding of JBIG2s, things should start to Just Work in PDFs.	2024-03-09 16:01:22 +01:00
Nico Weber	09ca66cb8b	LibGfx/JBIG2: Scan for end of immediate generic regions of unknown size Found e.g. in 0000033.pdf (both pages).	2024-03-09 16:01:22 +01:00
Nico Weber	379ef45688	LibGfx/JBIG2: Store location of segment data bodies They're in different places for Sequential/Embedded (right after the header) and RandomAccess (which has all headers first, followed by all data bits next). We don't do anything with the data yet, but now everything's in place to actually process segment data.	2024-03-09 16:01:22 +01:00
Nico Weber	953f6c5d9b	LibPDF+LibGfx: Pass jbig2-filtered data to JBIG2ImageDecoderPlugin Except for /JBIG2Globals, which we bail out on for now. In my 1000 files, 13 use JBIG2, and of those, 2 use JBIG2Globals (0000372.pdf e.g. page 11 and 0000857.pdf e.g. page 1), and only one (the latter) of the two uses the same JBIG2Globals stream for more than a single image. JBIG2ImageDecoderPlugin cannot decode the data yet, so no behavior change, but with `#define JBIG2_DEBUG 1` at the top of that file, it now prints segment header info for PDFs containing JBIG2 data :^)	2024-03-09 16:01:22 +01:00
Nico Weber	b1fdc33a22	LibGfx/JBIG2: Decode all segment headers With `#define JBIG2_DEBUG 1` at the top of the file: % Build/lagom/bin/image --no-output \ .../JBIG2_ConformanceData-A20180829/F01_200_TT10.jb2 JBIG2LoadingContext: Organization: 0 (Sequential) JBIG2LoadingContext: Number of pages: 1 Segment number: 0 Segment type: 48 Referred to segment count: 0 Segment page association: 1 Segment data length: 19 Segment number: 1 Segment type: 39 Referred to segment count: 0 Segment page association: 1 Segment data length: 12666 Segment number: 2 Segment type: 49 Referred to segment count: 0 Segment page association: 1 Segment data length: 0 Runtime error: JBIG2ImageDecoderPlugin: Draw the rest of the owl	2024-03-09 16:01:22 +01:00
Nico Weber	177664cfae	LibGfx/JBIG2: Add an initial decode_segment_header() With `#define JBIG2_DEBUG 1` at the top of the file: % Build/lagom/bin/image --no-output \ .../JBIG2_ConformanceData-A20180829/F01_200_TT10.jb2 JBIG2LoadingContext: Organization: 0 (Sequential) JBIG2LoadingContext: Number of pages: 1 Segment number: 0 Segment type: 48 Referred to segment count: 0 Segment page association: 1 Segment data length: 19 Runtime error: JBIG2ImageDecoderPlugin: Draw the rest of the owl	2024-03-09 16:01:22 +01:00
Nico Weber	5cefcad2fe	LibGfx/JBIG2: Decode the file header Running `image` with `#define JBIG2_DEBUG 1` now prints number of pages.	2024-03-09 16:01:22 +01:00
Nico Weber	58838db445	LibGfx: Add the start of a JBIG2 loader JBIG2 is infamous for two things: 1. It's used in xerox scanners were it falsifies scanned numbers: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning 2. It was allegedly used in an iOS zero day, in a very cool way: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html Needless to say, we need support for it in Serenity. (...because it's used in PDF files.) This adds all the scaffolding, but no actual implementation yet. It's enough for `file` to print the mime type of .jb2 files, but `image` can't do anything with the files yet.	2024-03-09 16:01:22 +01:00

23 commits