Commit graph

3 commits

Author SHA1 Message Date
Nico Weber
3fe9f8e48d LibPDF: Don't accidentally form new tokens on pages with contents arrays
A page's /Contents can be an array of streams, and the page's contents
are then as if those streams are concatenated.

Most of the time, a stream ends with whitespace. But in some cases
(e.g. 0000642.pdf from 0000.zip from the pdfa dataset), the first
stream ends with an operator (`Q`) and the next stream starts with
one (`q`), and the concatenation would form a new, unkonwn operator
(`Qq`). Separate the streams' contents with a space to prevent that.

Reduces numbers of PDF files we fail to open in the -n 500 case
from 11 to 10 (in either case, we then crash on 18 of the PDFs
that we do manage to open).
2023-10-23 13:23:54 -04:00
Nico Weber
afb99a67b2 LibPDF: Tweak Page::page_contents() implementation for brevity
Also replace a FIXME with a spec comment that answers it.
2023-07-12 18:22:35 -04:00
Nico Weber
69c965b987 LibPDF: Move code to compute full page contents into Page
Pure code move, no behavior change.
2023-07-12 18:22:35 -04:00