beenull/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-26 01:20:25 +00:00

Author	SHA1	Message	Date
Tim Schumacher	a2f60911fe	AK: Rename GenericTraits to DefaultTraits This feels like a more fitting name for something that provides the default values for Traits.	2023-11-09 10:05:51 -05:00
Nico Weber	bbd86ee4f3	LibPDF: Implement ExponentialInterpolationFunction	2023-11-06 10:01:05 +01:00
Nico Weber	1aed465efe	LibPDF: Implement Fuction::create()	2023-11-06 10:01:05 +01:00
Nico Weber	b78ea81de5	LibPDF: Implement SeparationColorSpace Requires PDF::Function, which isn't implemented yet, so this has no visual effect yet.	2023-11-06 10:01:05 +01:00
Nico Weber	9204252d02	LibPDF: Add scaffolding for function objects See PDF 1.7 Spec, "3.9 Functions".	2023-11-06 10:01:05 +01:00
Nico Weber	21894f1cde	LibPDF: Fix typos in DeviceN colorspace scaffolding * Compare array size to 3 and 4, not 4 and 5 * Fix literal typo in error message Fixes crash processing 0000906.pdf from 0000.zip from the pdf/a dataset.	2023-11-06 09:54:01 +01:00
Nico Weber	30ea218e35	LibPDF: Implement IndexedColorSpace	2023-11-05 14:27:22 -07:00
Nico Weber	0b087c02a3	LibPDF: Add spec link to default_decode()	2023-11-05 14:27:22 -07:00
Nico Weber	3dca11c4e2	LibPDF: Move color space creation from name or array into ColorSpace No behavior change.	2023-11-05 14:27:22 -07:00
Nico Weber	1dfd49ef99	LibPDF: Implement LabColorSpace	2023-11-05 14:27:22 -07:00
Nico Weber	4a5136fc8c	LibPDF: Implement CalGrayColorSpace I haven't seen this being used in the wild, but it's used in Tests/LibPDF/colorspaces.pdf.	2023-11-04 17:02:37 -04:00
Nico Weber	a207ab709a	LibPDF: In convert_to_srgb(), also apply sRGB curve (ish) We did convert from the input space to linear space and then to linear sRGB, but we forgot to re-apply gamma. This uses the x^2.2 curve instead of the real sRGB curve for now.	2023-11-04 17:02:37 -04:00
Nico Weber	641365b235	LibPDF: Move colorspace conversion functions up a bit No code change, no behavior change. Pure code move.	2023-11-04 17:02:37 -04:00
Nico Weber	f8799885de	LibPDF: Clamp sRGB channels before converting to u8 in CalRGB code Sometimes the numbers end up just slightly above 1.0f, which previously caused an overflow.	2023-11-01 11:45:13 -04:00
Nico Weber	bdd2404453	LibPDF: Ignore input whitepoint in convert_to_d65() CalRGBColorSpace::color() converts into a flat xyz space, which already takes input whitepoint into account. It shouldn't be taken into account again when converting from the flat color space to D65.	2023-11-01 11:45:13 -04:00
Nico Weber	e35a5da2fb	LibPDF: Update dead link in a comment	2023-11-01 11:45:13 -04:00
Nico Weber	1fcf0142d2	LibPDF: Fix unfortunate typo in CalRGBColorSpace::create() We always ignored the /Matrix key in /CalRGB dicts.	2023-11-01 11:45:13 -04:00
Nico Weber	d24289eef4	LibPDF: Always log unhandled type 1 and type 2 font program opcodes This would've made it easy to see that we were missing flex opcodes for https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf	2023-11-01 11:40:16 -04:00
Nico Weber	e1a743f286	LibPDF: Implement type 2 flex, hflex, hflex1, flex1 operators This is the type 2 equivalent to type2 othersubr, from what I can tell. See "4.1 Path Construction Operators" in 5177.Type2.pdf, "The Type 2 Charstring Format". Makes text show up alright on https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf	2023-11-01 11:40:16 -04:00
Nico Weber	3e707efdfa	LibPDF: Move type1 subr 0 handling into othersubr handler https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf, 8.4 First Four Subrs Entries: """If Flex or hint replacement is used in a Type 1 font program, the first four entries in the Subrs array in the Private dictionary must be assigned charstrings that correspond to the following code sequences. If neither Flex nor hint replacement is used in the font program, then this requirement is removed, and the first Subrs entry may be a normal charstring subroutine sequence. The first four Subrs entries contain: Subrs entry number 0: 3 0 callothersubr pop pop setcurrentpoint return """ othersubr handler 0 gets three arguments: * The flex height (the distance after which the bezier splines are replaced with just straight lines) * The current position after the flex It pushes that position on the postscript stack, where predefined subr handler number 0 then pops it from. It then passes it to setcurrentpoint. In theory, we now correctly do that setcurrentpoint call, which we previously weren't. In practice, that setcurrentpoint call always receives the last point of the flex -- and our path api apparently gets confused when move_to() is called on it when the current point is already at that same location. So tweak the SetCurrentPoint handler to not set the current point on the path if it's already the path's current point, with a FIXME to figure out what exactly is happening in Gfx::Path. No big behavior change if flex is used, but this is more correct if it isn't. (This only works because our `return` handler is empty, else we would have to make the callothersubr handler start a call frame.)	2023-11-01 11:38:41 -04:00
Nico Weber	0bb8249780	LibPDF: Move type1 subr 1 and 2 handling into othersubr handler https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf, 8.4 First Four Subrs Entries: """If Flex or hint replacement is used in a Type 1 font program, the first four entries in the Subrs array in the Private dictionary must be assigned charstrings that correspond to the following code sequences. If neither Flex nor hint replacement is used in the font program, then this requirement is removed, and the first Subrs entry may be a normal charstring subroutine sequence. The first four Subrs entries contain: [...] Subrs entry number 1: 0 1 callothersubr return Subrs entry number 2: 0 2 callothersubr return """ So subr entry numbers 1 and 2 just call othersubr 1 and and 2, which means we can just move the handling code over. No behavior change if flex is used, but more correct if it isn't. (This only works because our `return` handler is empty, else we would have to make the callothersubr handler start a call frame.)	2023-11-01 11:38:41 -04:00
Ali Mohammad Pur	78c04cb8b2	AK+LibPDF: Make Format print floats in a roundtrip-safe way by default Previously we assumed a default precision of 6, which made the printed values quite odd in some cases. This commit changes that default to print them with just enough precision to produce the exact same float when roundtripped. This commit adds some new tests that assert exact format outputs, which have to be modified if we decide to change the default behaviour.	2023-10-31 09:12:35 +03:30
Nico Weber	4cc24548f6	LibPDF: Call dbgln() for unimplemented flex upcodes	2023-10-28 13:28:05 -04:00
Nico Weber	e484fae8e1	LibPDF: Don't do special subr processing for type 2 CFFs This is a subset of #21484: Type 2 CFFs never use the special subrs, so stop doing them for type 2 at least for now. Fixes an assert in 0000064.pdf in 0000.zip in the pdfa dataset (a stack underflow because a subr is supposed to push a bunch of stuff, but instead it ran one of the built-in routines instead of the subr from the font file). As discussed in #21484, this isn't right for type 1 CFFs either, but just removing the code there regresses Tests/LibPDF/type1.pdf. A slightly more involved thing is needed there; I added a FIXME for that here.	2023-10-28 13:28:05 -04:00
Tim Ledbetter	5c0c55d2c0	LibPDF: Ensure xref stream field widths are within expected range Previously, an xref stream with a field with larger than 8 would result in an undefined shift occurring. We now ensure that each field width is a number and is less than or equal to 8.	2023-10-28 13:17:09 -04:00
Nico Weber	6d47fca3bf	LibPDF: Don't assert on outline destinations that use `null` as page Nothing in PDF 1.7 spec 8.2.1 Destinations mentions the page being `null`, but it happens in 0000372.pdf (for the root outline element) and in 0000776.pdf (for every outline element, which looks like a bug in the generator maybe) of 0000.zip from the pdfa dataset.	2023-10-27 06:38:25 -04:00
Tim Ledbetter	b4296e1c9b	LibPDF: Don't use unsanitized values in error messages Previously, constructing error messages with unsanitized input could fail because error message strings must be UTF-8.	2023-10-26 11:05:32 +02:00
Nico Weber	f8bf9c6506	LibPDF: Sketch out DeviceN color spaces a bit Documents using them now show render-time diagnostics instead of asserting that number of parameters passed to a color don't match whatever number of channels the previously-set color space had. Fixes two asserts on the `-n 500` 0000.zip test set.	2023-10-26 11:05:00 +02:00
Nico Weber	4549d6cf1b	LibPDF: Add a FIXME comment to the inline image data skipping path	2023-10-26 10:59:45 +02:00
Nico Weber	2878af5968	LibPDF: Sketch out Lab color space Same as other recent color spaces: Enough to make us not assert, but not enough to actually produce color. Fixes 2 asserts on the `-n 500` 0000.zip pdfa dataset.	2023-10-26 10:59:45 +02:00
Nico Weber	a65d8ff2ea	LibPDF: Tolerate page rotation being an indirect object Needed e.g. for 0000196.pdf in 0000.zip in the pdfa dataset.	2023-10-26 10:58:45 +02:00
Nico Weber	8b806183f6	LibPDF: Tolerate indirect objects in various image dict values 0000101.pdf from 0000.zip from the pdfa dataset has /Height set to an indirect object that contains an int. Make that work, and make sure various other similar places getting values of the image dict also resolve indirect references.	2023-10-26 10:58:45 +02:00
Nico Weber	5dd7639386	LibPDF: Tolerate indirect references in Type0 /W array Makes e.g. 0000236.pdf in 0000.zip in the pdfa dataset work.	2023-10-26 10:58:45 +02:00
Nico Weber	b928fadba7	LibPDF: Swap int and array branches in outline item reading No intended behavior change. It does have the effect that indirect object references now go down the array path instead of the number path. They still fall over there, but now that's easy to fix.	2023-10-26 10:58:45 +02:00
Nico Weber	208a058eab	LibPDF: Tolerate integer outline item colors 0000296.pdf from 0000.zip from the pdfa dataset contains `/C [0 0 0]` (as opposed to `/C [0.0 0.0 0.0]`). Make that work. (It's fine per spec.)	2023-10-26 10:58:45 +02:00
Nico Weber	54cdcd0d06	LibPDF: Reject non-hexdigits in hex string with error ...instead of VERIFY()ing input data. I haven't seen this in the wild, but since I'm here anyways, might as well fix this.	2023-10-25 10:44:26 +02:00
Nico Weber	4675700057	LibPDF: Reject unterminated literal strings with an error 0000459.pdf in 0000.zip in the pdfa dataset contains this as the very first object: ``` 1 0 obj << /Creator (Developer 2000) /CreatorDate ( /Author (Oracle Reports) /Producer (Oracle PDF driver) /Title (2021_06_29 Tutoritzacions APTES.PDF) >> endobj ``` The `/CreatorDate` value string is unterminated. Before, we'd assert when trying to check if the first object is a linearization dict. Now, we never read the first object (an error during the linearization dict reading is treated as "file is not linearized") unless we try to print the document's metadata -- and there we now show an error instead of asserting.	2023-10-25 10:44:26 +02:00
Nico Weber	c0f3f1674c	LibPDF: Make string literal parsing fallible ...and make running out of data after a \ an error instead of silently returning an empty string.	2023-10-25 10:44:26 +02:00
Nico Weber	311cc7d9b9	LibPDF: Implement two SeparationColorSpace methods Actually using separation color spaces still doesn't work, but we now no longer assert on them when they're used. Fixes 2 crashes on the `-n 500` 0000.zip pdfa dataset.	2023-10-25 05:52:47 +02:00
Nico Weber	e7f7c434f7	LibPDF: Don't check for `startxref` after trailer dict Several files have a comment after the trailer dict and the `startxref` after it. We really should add a consume_whitespace_and_comments() function and call that in most places we currently call consume_whitespace(). But in this case, for non-linearized files, we first jump to the end of the file, read `startxref`, then jump to `xref` from the offset there, and then read the trailer after the `xref`, only to read `startxref` again. So we can just not do that. (For linearized files, we now completely ignore `startxref`. But we don't use the data in `startxref` in linearized files anyways, so it's fine to not read it there too.) Reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 25 (8%) to 23 (7%).	2023-10-24 13:32:01 -04:00
Nico Weber	acf668e234	LibPDF: Make Reader::move_by() parameter more truthful No behavior change, just simpler and less surprising.	2023-10-24 13:30:25 -04:00
Nico Weber	3fe9f8e48d	LibPDF: Don't accidentally form new tokens on pages with contents arrays A page's /Contents can be an array of streams, and the page's contents are then as if those streams are concatenated. Most of the time, a stream ends with whitespace. But in some cases (e.g. 0000642.pdf from 0000.zip from the pdfa dataset), the first stream ends with an operator (`Q`) and the next stream starts with one (`q`), and the concatenation would form a new, unkonwn operator (`Qq`). Separate the streams' contents with a space to prevent that. Reduces numbers of PDF files we fail to open in the -n 500 case from 11 to 10 (in either case, we then crash on 18 of the PDFs that we do manage to open).	2023-10-23 13:23:54 -04:00
Nico Weber	11bee7a075	LibPDF: Don't crash on fixed-width type 1 fonts that use /MissingWidth Type 1 fonts usually have a m_font_program and no m_font -- they only have m_font if we're using a replacement font for the fonts that were built-in to PDFs before Acrobat 4.0 (and must still work to show existing files). However, SimpleFont::get_glyph_width() used to always return a float, which in Type1Font was only implemented if m_font was set. Per spec, we're supposed to just use /MissingWidth for fonts that are missing an entry in the descriptor's /Width array. However, for built-in fonts, no explicit /Width array is needed (PDF 1.7 spec, Appendix H.3, 5.5.1). So if we just always use /MissingWidth, then PDFs that use a built-in font draw all their text on top of each other (e.g. 000333.pdf from stillhq.com-pdfdb). So change get_glyph_width() to return Optional<float>, return it only in Type1Font if m_font is set, and use MissingWidth if it isn't set. That way, replacement fonts still return a width, and real fonts that are supposed to have /Width and use /MissingWidth for missing entries do what they're supposed to too, instead of crashing. From 20 (6%) to 16 (5%) crashes on the 300 first PDFs, and from 39 (7.8%) to 31 (6.2%) on the 500-random PDFs test.	2023-10-23 09:33:03 -04:00
Nico Weber	52afa936c4	LibPDF: Don't over-read in charset formats 1 and 2 `left` might be a number bigger than there are actually glyphs in the CFF. The spec says "The number of ranges is not explicitly specified in the font. Instead, software utilizing this data simply processes ranges until all glyphs in the font are covered." Apparently we have to check for this within each range as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset. Together with the previous commit: From 21 (7%) to 20 (6%) crashes on the 300 first PDFs, and from 41 (8.2%) to 39 (7.8%) on the 500-random PDFs test.	2023-10-23 09:31:11 -04:00
Nico Weber	58ff7b5336	LibPDF: Support offset size 3 in CFF index reading ...and replace template instantiations with a loop, to make this easily possible. Vaguely nice for code size as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset.	2023-10-23 09:31:11 -04:00
Nico Weber	3197f0cab6	LibPDF: Handle CFF fonts with charset format 0 and > 255 glyphs better We used to use an u8 as loop counter, which would overflow if there were more than 255 glyphs, producing hundreds of megabytes of Couldn't find string for SID x, going with space output in the process, while all data until the end of the CFF section got interpreted as SIDs, until a try_read() would finally fail. We now no longer fail miserably trying to render page 2 of 0000352.pdf of 0000.zip from the pdfa dataset. Fixes just one crash of the larger 500-document test set, but when I tweak test_pdf.py to print all stacks instead of just the top 5, it no longer produces 260 MB of output.	2023-10-23 09:31:11 -04:00
Nico Weber	0869ca5615	LibPDF: Add more CFF_DEBUG output	2023-10-23 09:31:11 -04:00
Nico Weber	cf705eb235	LibPDF: Use TRY() to get decompression result Makes us die with a better error message for some PDFs.	2023-10-23 09:30:41 -04:00
Nico Weber	6153dd7b84	LibPDF: Tolerate comments after dict values Makes 0000607.pdf from 0000.zip from the pdfa dataset load.	2023-10-23 09:28:00 -04:00
Nico Weber	a1f17bd643	LibPDF: Skip inline image data in operator stream Inline images can contain arbitrary binary data in the operator stream, greatly confusing the operator parser. Just skip them for now. They'll produce a `Rendering of feature not supported: draw operation: inline_image_begin` diag as usual, so we won't forget about it. After #21536, reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 23 (7%) to 22 (7%). On a larger sample (`Meta/test_pdf.py -n 500 ~/Downloads/0000`), reduces number of crashes from 53 (10.6%) with 36 distinct crash stacks to 46 (9.2%) with 33 distinct stacks.	2023-10-23 07:51:08 +02:00

1 2 3 4 5 ...

439 commits