0ct0pu5/ladybird

Author	SHA1	Message	Date
Kyle Pereira	8191f2b47a	LibPDF: Add parameter for background color of render	2023-12-10 16:44:24 +01:00
Kyle Pereira	60c4803dd3	LibPDF: Pass Renderer to ColorSpace	2023-12-10 16:44:24 +01:00
Kyle Pereira	082a4197b6	LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces This is in anticipation of Pattern color space support which does not yield a simple color.	2023-12-10 16:44:24 +01:00
Nico Weber	832a065687	LibPDF: For low-bpp images, start scanlines on byte boundaries Required per spec, and we get slanted images without it. Fixes e.g. page 1 of 0000749.pdf.	2023-12-07 08:10:40 +00:00
Nico Weber	06b9633da5	LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors to 8-bit nicely, but is the wrong thing to do for palette indices. Stop doing this for palette indices. Fixes "Indexed color space index out of range" for 11 files in the PDF/A 0000.zip test set now that we correctly handle palette indices as of the previous commit: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	f34da6396f	LibPDF: Update font size after getting font from cache Page 1 of 0000277.pdf does: BT 22 0 0 22 59 28 Tm /TT2 1 Tf (Presented at Photonics West OPTO, February 17, 2016) Tj ET BT 32 0 0 32 269 426 Tm /TT1 1 Tf (Robert W. Boyd) Tj ET BT 22 0 0 22 253 357 Tm /TT2 1 Tf (Department of Physics and) Tj ET BT 22 0 0 22 105 326 Tm /TT2 1 Tf (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET Every line begins a text operation, then updates the font matrix, selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text operation. `Tm` (which sets the font matrix) contains a scale, and uses that to update the font size of the currently-active font (cf #20084). But in this file, we `Tm` first and `Tf` (font selection) second, so this updates the size of the old font. So when we pull it out of the cache again on line 3, it would still have the old size from the `Tm` on line 2. (The whole text scaling logic in LibPDF imho needs a rethink; the current approach also causes issues with zero-width glyphs which currently lead to divisions by zero. But that's for another PR.) Fixes another regression from `c8510b58a3` (which I've accidentally referred to by 2340e834cd in another commit).	2023-11-26 19:05:13 -05:00
Nico Weber	eb1c99bd72	LibPDF+LibGfx: Make SMasks on jpeg images work SMasks are greyscale images that get used as alpha channel for a different image. JPEGs in PDFs are stored as streams with /DCTDecode filters, and we have a separate code path for loading those in the PDF renderer. That code path just calls our JPEG decoder, which creates bitmaps with format BGRx8888. So when we process an SMask for such a bitmap, we have to change the bitmap's format to BGRA8888 in addition to setting alpha values on all pixels.	2023-11-23 12:13:03 +01:00
Nico Weber	4440452f92	LibPDF: Support images with 1, 2, 4 bits per pixel They just get upsampled to 8 bits per pixel images.	2023-11-18 07:33:15 +00:00
Nico Weber	29396415d5	LibPDF: Add an initial implementation of type 3 glyph rendering This is a very inefficient implementation: Every time a type 3 font glyph is drawn, we parse its operator stream and execute all the operators therein. We'll want to instead cache the glyphs in bitmaps (at least in most cases), like we do for other fonts. But it's a good first step, and all the coordinate math seems to work in the files I've tested. Good test files from pdfa dataset 0000.zip: - 0000559.pdf page 1 (and 2): Has a non-default font matrix; text appears mirrored if the font matrix isn't handled correctly - 0000425.pdf, page 1: Draws several glyphs in a single run; glyphs overlap if Renderer::render_type3_glyph() ignores the passed-in point - 0000211.pdf, any page: Uses type 3 glyphs for all text. Good perf test (already "reasonably fast") - 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the purple box is a type 3 font glyph, and it's colored (which in part means the first operator is `d0`, while all the other documents above use `d1`)	2023-11-17 19:47:53 +00:00
Nico Weber	14ddab5519	LibPDF: Stub out type3_font_set_glyph_width* Type 3 font glyphs begin with either `d0` or `d1`. If we bail out with an "unsupported" error on the very first operator in a glyph, we'll never paint the glyph. Just stub these out for now. We probably want to do more in here in the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).	2023-11-17 19:47:53 +00:00
Nico Weber	5513f8bbe3	LibPDF: Move ScopedState from a function on Renderer into Renderer No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	bcc6439b5f	LibPDF: Pass Renderer to PDFFont::draw_string() It's a bit unfortunate that fonts need to know about the renderer, but type 3 fonts contain PDF drawing operators, so it's necessary. On the bright side, it makes it possible to pass fewer parameters around and compute things locally as needed. (As we implement more fonts, we'll probably want to create some functions to do these computations in a central place, eventually.) No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	5eaa403ddf	LibPDF: Use font dictionary object as cache key, not resource name In the main page contents, /T0 might refer to a different font than it might refer to in an XObject. So don't use the `Tf` argument as font cache key. Instead, use the address of the font dictionary object. Fixes false cache sharing, and also allows us to share cache entries if the same font dict is referred to by two different names. Fixes a regression from 2340e834cd (but keeps the speed-up intact).	2023-11-17 19:14:39 +01:00
Nico Weber	9b022239c3	LibPDF: Apply all offsets of TJ operator TJ acts on a list of either strings or numbers. The strings are drawn, and the numbers are treated as offsets. Previously, we'd only apply the last-seen number as offset when we saw a string. That had the effect of us ignoring all but the last number in front of a string, and ignoring numbers at the end of the list. Now, we apply all numbers as offsets. Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.	2023-11-14 10:11:09 +01:00
Lucas CHOLLET	9e4d697d23	LibPDF: Detect DCT images correctly Images can have multiple filters, each one of them is processed sequentially. Only the last one will be relevant for the image format (DCT or JPXDecode), so use the last filter instead of the first one to detect that property.	2023-11-13 10:30:34 -05:00
Nico Weber	3dca11c4e2	LibPDF: Move color space creation from name or array into ColorSpace No behavior change.	2023-11-05 14:27:22 -07:00
Nico Weber	8b806183f6	LibPDF: Tolerate indirect objects in various image dict values 0000101.pdf from 0000.zip from the pdfa dataset has /Height set to an indirect object that contains an int. Make that work, and make sure various other similar places getting values of the image dict also resolve indirect references.	2023-10-26 10:58:45 +02:00
Nico Weber	1a58fee0fd	LibPDF: Don't assert on named simple color space If a PDF uses `/CustomName cs` and `/CustomName` then points at just a name like `/DeviceGray` instead of an array, that's ok. Just using `/DeviceGray cs` is simpler, so this extra level of indirection is somewhat rare in practice, but it's valid and it does happen. So support it. We already have a helper that does the right thing that we just need to call. Together with #21524 and #21525, reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 29 (9%) to 25 (8%).	2023-10-21 21:04:26 +02:00
Nico Weber	34cb506bad	LibPDF: Replace another TODO with a message Like `ca1a98ba9f`, but for stroke color.	2023-10-21 09:09:06 +02:00
Nico Weber	9442782881	LibPDF: Implement text_next_line_show_string_set_spacing Not used terribly often, but e.g. used in 000333.pdf page 17 in stillhq.com-pdfdb.	2023-10-20 14:24:31 -04:00
Nico Weber	78dea9500f	LibPDF: Make operator parsing use ReadonlySpan instead of Vector No behavior change.	2023-10-20 14:24:31 -04:00
Nico Weber	aea0e2f313	LibPDF: Rename ColorSpaceFamily function to may_be_specified_directly() It used to be called ColorSpaceFamily::never_needs_parameters(). But in the cpp file, the macro arg was called ever_needs_parameters, and the spec says "If the color space is one that can be specified by a name and no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain cases of Pattern), the name may be specified directly." so let's use that language here. No behavior change.	2023-10-20 10:35:54 -06:00
Nico Weber	f5d3f47af3	LibPDF: Add spec comment about color spaces on images	2023-10-20 08:58:52 +02:00
Nico Weber	7c24a89acf	LibPDF: Add spec comment about valid bits_per_component values	2023-10-20 08:58:52 +02:00
Nico Weber	64bb9aa8c7	LibPDF: Fix comment typo	2023-10-20 08:58:52 +02:00
Nico Weber	ea6fed627a	LibPDF: Get color rendering intent from image dict Still not used for anything, so no behavior change.	2023-10-20 08:58:52 +02:00
Nico Weber	708d5e2fe6	LibPDF: Implement color_rendering_intent operator Implements the `ri` operator, and the `RI` key in a graphics state dictionary. We don't do anything yet with the color rendering intent except store it. No behavior change except removing a few "not yet implemented" messages.	2023-10-19 16:51:16 -04:00
Nico Weber	609e640530	LibPDF: Try harder to use a RAII object to restore state Follow-up to #21489. There, I made us use a RAII object. That's great, but if the embedded instruction stream pushes its own graphics state, then an early return would cause us to not process graphics state pop instructions in the embedded stream. To fix this, remember the graphics stack depth before entering the nested instruction stream, and explicitly shrink the stack back to that size upon exit. Enables us to render all pages of https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf without crashing.	2023-10-19 16:49:00 -04:00
Nico Weber	b835d2bd66	LibPDF: Use a RAII object to restore state in recursive render Previously, if one operator returned an error, the TRY() would cause us to return without restoring the outer graphics state, leading to problems such as handing a 3-tuple to a grayscale color space (because the inner object set up a grayscale color space that we failed to dispose of). Makes us crash later on page 43 of https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf	2023-10-18 19:43:31 -04:00
Nico Weber	3c2d820391	LibPDF: If softmask has different size than target bitmap, resize it Size of smask and image aren't guaranteed to be equal by the spec (...except for /Matte, see page 555 of the PDF 1.7 spec, but we don't implement that), and in pratice they sometimes aren't. Fixes an assert on page 4 of https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf We now make it all the way to page 43 of 64 before crashing.	2023-10-18 20:03:35 +01:00
Nico Weber	c8510b58a3	LibPDF: Cache fonts per page Previously, every time a page switched fonts, we'd completely re-parse the font. Now, we cache fonts in Renderer, effectively caching them per page. It'd be nice to have an LRU cache across pages too, but that's a bigger change, and this already helps a lot. Font size is part of the cache key, which means we re-parse the same font at different font sizes. That could be better too, but again, it's a big help as-is already. Takes rendering the 1310 pages of the PDF 1.7 reference with Build/lagom/bin/pdf --debugging-stats \ ~/Downloads/pdf_reference_1-7.pdf from 71 s to 11s :^) Going through pages especially in the index is noticeably snappier. (On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less dramatic: From 19s to 16s.)	2023-10-11 07:10:19 +02:00
MacDue	6088374ad2	LibPDF: Ensure all subpaths are closed before filling paths This lets us correctly draw figure 3.4 in pdf_reference_1-7.pdf.	2023-07-25 13:42:40 +02:00
Nico Weber	ca1a98ba9f	LibPDF: Replace two more crashes with messages	2023-07-23 23:05:32 -04:00
Nico Weber	29c3a9c5f0	LibPDF: Don't crash on images without /Filter Fixes a crash rendering page 819 of ISO_32000-2-2020_sponsored.pdf which contains an uncompressed 2x2 1bpp grayscale bitmap.	2023-07-23 23:04:55 -04:00
Nico Weber	18b86b1868	LibPDF: Apply text matrix scale to character and word spacing	2023-07-22 12:24:29 -04:00
Nico Weber	e3cc05b935	LibPDF: Don't ignore word_spacing	2023-07-22 12:24:29 -04:00
Nico Weber	6caaffa134	LibPDF: Add a few FIXMEs to set_graphics_state_from_dict	2023-07-21 08:17:12 +02:00
Matthew Olsson	5f8fd47214	LibPDF: Resize fonts when the text and line matrices change	2023-07-20 06:56:41 +01:00
Matthew Olsson	9a0e1dde42	LibPDF: Propogate errors from ColorSpace::color()	2023-07-20 06:56:41 +01:00
Nico Weber	c625ba34fe	LibPDF: Implement set_flatness_tolerance We now track it in the graphics state. It isn't used for anything yet. Fixes the one thing that rendering the first 100 pages of pdf_reference_1-7.pdf complains about.	2023-07-12 18:22:52 -04:00
Nico Weber	69c965b987	LibPDF: Move code to compute full page contents into Page Pure code move, no behavior change.	2023-07-12 18:22:35 -04:00
Julian Offenhäuser	4ec01669fc	LibPDF: Scale vector paths with the view This ensures that lines have the correct size at every scale factor.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	731676c041	LibPDF: Accept floats as line dash pattern phases	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	b90a794d78	LibPDF: Allow pages with no specified contents The contents object may be omitted as per spec, which will just leave the page blank.	2023-03-25 16:27:30 -06:00
Andreas Kling	8a48246ed1	Everywhere: Stop using NonnullRefPtrVector This class had slightly confusing semantics and the added weirdness doesn't seem worth it just so we can say "." instead of "->" when iterating over a vector of NNRPs. This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.	2023-03-06 23:46:35 +01:00
Rodrigo Tobar	bf61f94413	LibPDF: Don't crash when a font hasn't been loaded yet This could happen because there was a problem while loading the first font in the document.	2023-03-02 12:18:53 +01:00
Rodrigo Tobar	79b4293687	LibPDF: Prevent crashes when loading XObject streams These streams might need a Filter that isn't implemented yet, and thus cannot be blindly MUST()-ed.	2023-03-02 12:18:53 +01:00
Rodrigo Tobar	cb04e4e9da	LibPDF: Refactor Font classes The PDFFont class hierarchy was very simple (a top-level PDFFont class, followed by all the children classes that derived directly from it). While this design was good enough for some things, it didn't correctly model the actual organization of font types: PDF fonts are first divided between "simple" and "composite" fonts. The latter is the Type0 font, while the rest are all simple. * PDF fonts yield a glyph per "character code". Simple fonts char codes are always 1 byte long, while Type0 char codes are of variable size. To this effect, this commit changes the hierarchy of Font classes, introducing a new SimpleFont class, deriving from PDFFont, and acting as the parent of Type1Font and TrueTypeFont, while Type0 still derives from PDFFont directly. This distinction allows us now to: * Model string rendering differently from simple and composite fonts: PDFFont now offers a generic draw_string method that takes a whole string to be rendered instead of a single char code. SimpleFont implements this as a loop over individual bytes of the string, with T1 and TT implementing draw_glyph for drawing a single char code. * Some common fields between T1 and TT fonts now live under SimpleFont instead of under PDFfont, where they previously resided. * Some other interfaces specific to SimpleFont have been cleaned up, with u16/u32 not appearing on these classes (or in PDFFont) anymore. * Type0Font's rendering still remains unimplemented. As part of this exercise I also took the chance to perform the following cleanups and restructurings: * Refactored the creation and initialisation of fonts. They are all centrally created at PDFFont::create, with a virtual "initialize" method that allows them to initialise their inner members in the correct order (parent first, child later) after creation. * Removed duplicated code. * Cleaned up some public interfaces: receive const refs, removed unnecessary ctro/dtors, etc. * Slightly changed how Type1 and TrueType fonts are implemented: if there's an embedded font that takes priority, otherwise we always look for a replacement. * This means we don't do anything special for the standard fonts. The only behavior previously associated to standard fonts was choosing an encoding, and even that was under questioning.	2023-02-24 20:16:50 +01:00
Rodrigo Tobar	db9fa7ff07	LibPDF: Allow show_text to return errors Errors can (and do) occur when trying to render text, and so far we've silently ignored them, making us think that all is well when it isn't. Letting show_text return errors will allow us to inform the user about these errors instead of having to hiding them.	2023-02-24 20:16:50 +01:00
Rodrigo Tobar	82bac7e665	LibPDF: Fix clipping of painting operations While the clipping logic was correct (current v/s new clipping path), the clipping path contents weren't. This commit fixed that. We calculate the clipping path in two places: when we set it to be the whole page at graphics state creation time, and when we perform clipping path intersection to calculate a new clipping path. The clipping path is then used to limit painting by passing it to the painter (more precisely, but passing its bounding box to the painter, as the latter doesn't support arbitrary path clipping). For this last point the clipping path must be in device coordinates. There was however a mix of coordinate systems involved in the creation, update and usage of the clipping path: * The initial values of the path (i.e., the whole page) were in user coordinates. * Clipping path intersection was performed against m_current_path, which is in device coordinates. * To perform the clipping operation, the current clipping path was assumed to be in user coordinates. This mix resulted in the clipping not working correctly depending on the zoom level at which one visualised a page. This commit fixes the issue by always keeping track of the clipping path in device coordinates. This means that the initial full-page contents are now converted to device coordinates before putting them in the graphics state, and that no mapping is performed when applied the clipping to the painter.	2023-02-04 12:29:57 +01:00

1 2 3 4

153 commits