0ct0pu5/ladybird

Author	SHA1	Message	Date
Julian Offenhäuser	34350ee9e7	LibPDF: Allow reading documents with incremental updates The PDF spec allows incremental changes of a document by appending a new XRef table and file trailer to it. These will only contain the changed objects and will point back to the previous change, forming an arbitrarily long chain of XRef sections and file trailers. Every one of those XRef sections may be encoded as an XRef stream as well, in which case the trailer is part of the stream dictionary as usual. To make this easier, I made it so every XRef table may "own" a trailer. This means that the main file trailer is now part of the main XRef table.	2023-02-12 10:55:37 +00:00
Rodrigo Tobar	a533ea7ae6	LibPDF: Improve stream parsing When parsing streams we rely on a /Length item being defined in the stream's dictionary to know how much data comprises the stream. Its value is usually a direct value, but it can be indirect. There was however a contradiction in the code: the condition that allowed it to read and use the /Length value required it to be a direct value, but the actual code using the value would have worked with indirect ones. This meant that indirect /Length values triggered the fallback, "manual" stream parsing code. On the other hand, this latter code was also buggy, because it relied on the "endstream" keyword to appear on a separate line, which isn't always the case. This commit both fixes the bug in the manual stream parsing scenario, while also allowing for indirect /Length values to be used to parse streams more directly and avoid the manual approach. The main caveat to this second change is that for a brief period of time the Document is not able to resolve references (i.e., before the xref table itself is not parsed). Any parsing happening before that (e..g, the linearization dictionary) must therefore use the manual stream parsing approach.	2023-02-08 19:47:15 +01:00
Tim Schumacher	b1bfeb391e	LibPDF: Use `Core::Stream` to parse the page offset hint table	2023-01-21 00:45:33 +00:00
Julian Offenhäuser	d1bc89e30b	LibPDF: Try to repair XRef tables with broken indices An XRef table usually starts with an object number of zero. While it could technically start at any other number, this is a tell-tale sign of a broken table. For the "broken" documents I encountered, this always meant that some objects must have been removed from the start of the table, without updating the following indices. When this is the case, the document is not able to be read normally. However, most other PDF parsers seem to know of this quirk and fix the XRef table automatically. Likewise, we now check for this exact case, and if it matches up with what we expect, we update the XRef table such that all object numbers match the actual objects found in the file again.	2022-11-25 22:44:47 +01:00
Julian Offenhäuser	563d91b6c4	LibPDF: Implement loading compressed objects from object streams Now, whenever the xref table points to a compressed object, parse_object_with_index will look it up in the corresponding object stream as if it were a regular object. With this, our parser gains the bare minimum support for xref streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	f9beff7b5e	LibPDF: Initial work on parsing xref streams Since PDF version 1.5, a document may omit the xref table in favor of a new kind of xref stream object. This is used to reference so-called "compressed" objects that are part of an object stream. With this patch we are able to parse this new kind of xref object, but we'll have to implement object streams to use them correctly.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00

7 commits