0ct0pu5/ladybird

Author	SHA1	Message	Date
Nico Weber	6de32e5359	LibPDF: Draw inline images The idea is to massage the inline image data into something that looks like a regular image, and then use the normal image drawing code: We translate the inline image abbreviations to the expanded version at rendering time, then unfilter (i.e. uncompress) the image data at rendering time, and the go down the usual image drawing path. Normal streams are unfiltered when they're first accessed, but inline image streams live in a page's drawing operators, and this fits the current approach of parsing a page's operators anew every time the page is rendered. (We also need to add some special-case handling for color spaces of inline images: Inline images can use named color spaces, while regular images always use direct color space objects.)	2023-12-20 12:45:16 -07:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Idan Horowitz	aad5c58996	LibPDF: Eliminate reference cycle between OutlineItem parent/children Since all parents held a reference pointer to their children, and all children held reference pointers to their parents, both objects would never get free'd once the document was no longer being used. Fixes ossfuzz-63833.	2023-12-02 22:23:53 +01:00
Nico Weber	e39a790c82	LibPDF: Stop converting encodings in object parser Per 1.7 spec 3.8.1, there are multiple logical text string types: * text strings * ASCII strings * byte strings Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0) UTF-8. But byte strings shouldn't be converted but treated as binary data. This makes us no longer convert strings used for drawing page text. TABLE 5.6 "Text-showing operators" lists the operands for text-showing operators as just "string", not "text string" (even though these strings confusingly are called "text strings" in the body text), so not doing this there is correct (and matches other viewers). We also no longer incorrectly convert strings used for cypto data (such as passwords), if they start with an UTF-16BE or UTF-8 marker. No behavior change for outlines and info dict entries. https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of this. (ASCII strings only contain ASCII characters and behave the same anyways.)	2023-11-22 09:08:06 -07:00
Ali Mohammad Pur	78c04cb8b2	AK+LibPDF: Make Format print floats in a roundtrip-safe way by default Previously we assumed a default precision of 6, which made the printed values quite odd in some cases. This commit changes that default to print them with just enough precision to produce the exact same float when roundtripped. This commit adds some new tests that assert exact format outputs, which have to be modified if we decide to change the default behaviour.	2023-10-31 09:12:35 +03:30
Nico Weber	f646e47d46	LibPDF: Extract a create_destination_from_object() function No big behavior change. The new function now produces an error if a destination isn't in one of the supported formats.	2023-10-18 06:29:02 -04:00
Nico Weber	532230c0e4	LibPDF: Extract a Document::read_filters() method No behavior change.	2023-07-24 09:50:45 -04:00
Nico Weber	ca433befa0	LibPDF: Add method to Document to dump a Page and all related objects ...except for the /Parent object, else we'd print all pages :)	2023-07-13 20:29:58 +02:00
Nico Weber	f4f8a6a1bf	LibPDF: Move Page into its own file Page.h	2023-07-12 18:22:35 -04:00
Nico Weber	ea89053c12	LibPDF: Make PDF version accessible on Document	2023-07-11 13:49:17 -04:00
Nico Weber	c5c940b1c9	LibPDF: Add accessor for the document's info dict This dict contains some metadata in some files. Newer files also contain XMP metadata, but it's recommended to still include this dict as well, for compatibility with older readers. And it's much less complex than XMP, so let's support it.	2023-07-10 17:49:07 +01:00
Nico Weber	93357a8b70	LibPDF: Fix a typo in a function name ...and while here, a comment typo too.	2023-07-05 18:42:39 +01:00
Julian Offenhäuser	95a804bc4e	LibPDF: Allow the page rotation to be inherited	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	b90a794d78	LibPDF: Allow pages with no specified contents The contents object may be omitted as per spec, which will just leave the page blank.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	fde990ead8	LibPDF: Allow optional inheritable page attributes Previously, get_inheritable_object would always try to find the object and throw an error if it couldn't. The spec tells us that some page attributes, like CropBox, are optional but also inheritable. Others, like the media box and resources, are technically required by the spec, but omitted by some documents. In both cases, we are now able to search for inheritable objects and find a suitable replacement if there wasn't one.	2023-03-25 16:27:30 -06:00
Andreas Kling	8a48246ed1	Everywhere: Stop using NonnullRefPtrVector This class had slightly confusing semantics and the added weirdness doesn't seem worth it just so we can say "." instead of "->" when iterating over a vector of NNRPs. This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.	2023-03-06 23:46:35 +01:00
Rodrigo Tobar	a533ea7ae6	LibPDF: Improve stream parsing When parsing streams we rely on a /Length item being defined in the stream's dictionary to know how much data comprises the stream. Its value is usually a direct value, but it can be indirect. There was however a contradiction in the code: the condition that allowed it to read and use the /Length value required it to be a direct value, but the actual code using the value would have worked with indirect ones. This meant that indirect /Length values triggered the fallback, "manual" stream parsing code. On the other hand, this latter code was also buggy, because it relied on the "endstream" keyword to appear on a separate line, which isn't always the case. This commit both fixes the bug in the manual stream parsing scenario, while also allowing for indirect /Length values to be used to parse streams more directly and avoid the manual approach. The main caveat to this second change is that for a brief period of time the Document is not able to resolve references (i.e., before the xref table itself is not parsed). Any parsing happening before that (e..g, the linearization dictionary) must therefore use the manual stream parsing approach.	2023-02-08 19:47:15 +01:00
Timothy Flynn	f3db548a3d	AK+Everywhere: Rename FlyString to DeprecatedFlyString DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.	2023-01-09 23:00:24 +00:00
Rodrigo Tobar	a5620fd41f	LibPDF: Load destinations from Catalogue -> Names -> Dests name tree PDF allows for named destinations to be provided as string. These can be either found in the Dests dictionary in the document catalogue (as already implemented), or in the Name Tree specified by the Dests key in the Names dictionary of the document catalogue (missing). This commit adds this missing case. Once the named destination is found in the name tree, its value is interpreted just like in the first case, so a new utility method encapsulates the common behavior.	2023-01-06 18:06:41 +01:00
Rodrigo Tobar	5420261347	LibPDF: Implement name tree lookups Name Trees are hierarchical, string-keyed, sorted-by-key dictionary structures in PDF where each node (except the root) specifies the bounds of the values it holds, and either its kids (more nodes) or the key/value pairs it contains. This commit implements a series of lookup calls for finding a key in such name trees. This implementation follows the tree as needed on each lookup, but if that becomes inefficient in the long run we can switch to creating a HashMap with all the contents, which as a drawback will require more memory.	2023-01-06 18:06:41 +01:00
Rodrigo Tobar	0e1c858f90	LibPDF: Move casting code to its own cast_to function This functionality was previously part of the resolve_to() Document method, and thus only available only when resolving objects through the Document class. There are many use cases where this casting can be used, but no resolution is needed. This commit moves this functionality into a new cast_to function, and makes the resolve_to function call it internally. With this new function in place we can now offer new versions of DictObject::get_* and ArrayObject::get_*_at that don't perform Document resolution unnecessarily when not required.	2023-01-06 18:06:41 +01:00
Rodrigo Tobar	f510b2b180	LibPDF: Support null destination parameters Destination arrays contain a page number, a mode name, and parameters specific to that mode. In many cases these parameters can be set to "null", which our code wasn't taking into consideration. This commit parses these parameters taking into account whether they are null or actual numbers, and stores them as Optional<float> instead of plain floats. The parameters are not yet used anywhere else other than when formatting a Destination object, so the change is fairly small.	2023-01-06 18:06:41 +01:00
Rodrigo Tobar	2485c500a3	LibPDF: Fix Destination formatting This was not correctly written, and thus printed confusing output.	2023-01-06 18:06:41 +01:00
Rodrigo Tobar	6df9aa8f2c	LibPDF: Store page number, not Value, in OutlineItem The Value previously stored corresponded to a Reference to a Page object in the PDF document. This isn't useful information, since what we want to display at the end of the day is the page an outline item refers to. This commit changes the page member on OutlineItem to be a Optional<u32> (some destinations don't necessarily refer to a Page), which we resolve while building OutlineItems.	2022-12-17 19:40:52 +01:00
Rodrigo Tobar	cb1a7cc721	LibPDF: Simplify outline construction While the Outline Items making up the document's Outline have all sorts of cross-references (parent, first/last chlid, next/previous sibling, etc), not all documents out there have fully-consistent references. Our implementation already discarded some of that information too (e.g., /Parent and /Prev were never read), and trusted that /First and /Next were good enough to traverse the whole hierarchy. Where the current implementation failed was in assuming that /Last was also a good source of information. There are documents out there were /Last also points to dead ends, and were therefore causing a crash when we verified that the last child found on a chain was the /Last child declared by the parent. To fix this I'm simply removing the check, and simplifying the function call to remove any references to /Last. This way we affirm our commitment to /First and /Next as the main sources of information.	2022-12-16 01:24:43 -07:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Julian Offenhäuser	36f83cecab	LibPDF: Allow page objects to inherit the MediaBox and Resources entries	2022-10-16 17:44:54 +02:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	7456904a39	Meta+Userland: Simplify some formatters These are mostly minor mistakes I've encountered while working on the removal of StringView(char const*). The usage of builder.put_string over Format<FormatString>::format is preferrable as it will avoid the indirection altogether when there's no formatting to be done. Similarly, there is no need to do format(builder, "{}", number) when builder.put_u64(number) works equally well. Additionally a few Strings where only constant strings were used are replaced with StringViews.	2022-07-12 23:11:35 +02:00
Matthew Olsson	3ecb41b7d9	PDFViewer: Support a continuous page view mode	2022-04-04 14:59:37 +02:00
Matthew Olsson	5b316462b2	LibPDF: Add implementation of the Standard security handler Security handlers manage encryption and decription of PDF files. The standard security handler uses RC4/MD5 to perform its crypto (AES as well, but that is not yet implemented).	2022-03-29 02:52:57 +02:00
Matthew Olsson	e9342183f0	LibPDF: Support all Dest types	2022-03-07 10:53:57 +01:00
Matthew Olsson	73cf8205b4	LibPDF: Propagate errors in Parser and Document	2022-03-07 10:53:57 +01:00
Simon Woertz	c857b5d22f	LibPDF: Convert `PDF::Parser::m_document` from `RefPtr` to `WeakPtr` Otherwise both `PDF::Document` and `PDF::Parser` have a `RefPtr` pointing to each other which leads to a memory leak due to a circular dependency.	2022-01-08 18:57:55 +01:00
Andreas Kling	216e21a1fa	AK: Convert AK::Format formatting helpers to returning ErrorOr<void> This isn't a complete conversion to ErrorOr<void>, but a good chunk. The end goal here is to propagate buffer allocation failures to the caller, and allow the use of TRY() with formatting functions.	2021-11-17 00:21:13 +01:00
Andreas Kling	80d4e830a0	Everywhere: Pass AK::ReadonlyBytes by value	2021-11-11 01:27:46 +01:00
Ben Wiederhake	f84a7e2e22	LibPDF: Replace Value class by AK::Variant This decreases the memory consumption by LibPDF by 4 bytes per Value, compensating exactly for the increase in an earlier commit. :^)	2021-09-20 17:39:36 +04:30
Ben Wiederhake	edc0cd29f8	LibPDF: Break weird dependency cycle Old situation: Object.h defines Object Object.h defines ArrayObject ArrayObject requires the definition of Object ArrayObject requires the definition of Value Value.h defines Value Value requires the definition of Object Therefore, a file with the single line "#include <Value.h>" used to raise compilation errors; certainly not something that one might expect from a library. This patch splits up the definitions in Object.h to break the cycle. Now, Object.h only defines Object, Value.h still only defines Value (and includes Object.h), and the new header ObjectDerivatives.h defines ArrayObject (and includes both Object.h and Value.h).	2021-09-20 17:39:36 +04:30
Matthew Olsson	612b183703	LibPDF: Convert to east-const to comply with the recent style changes	2021-06-12 22:45:01 +04:30
Matthew Olsson	e23bfd7252	LibPDF: Parse linearized PDF files This is a big step, as most PDFs which are downloaded online will be linearized. Pretty much the only difference is that the xref structure is slightly different.	2021-06-12 22:45:01 +04:30
Matthew Olsson	78bc9d1539	LibPDF: Refine the distinction between the Document and Parser The Parser should hold information relevant for parsing, whereas the Document should hold information relevant for displaying pages. With this in mind, there is no reason for the Document to hold the xref table and trailer. These objects have been moved to the Parser, which allows the Parser to expose less public methods (which will be even more evident once linearized PDFs are supported).	2021-06-12 22:45:01 +04:30
Matthew Olsson	1ef5071d1b	LibPDF: Harden the document/parser against errors	2021-06-12 22:45:01 +04:30
Matthew Olsson	a08922d2f6	LibPDF: Parse outline structures	2021-05-25 00:24:09 +04:30
Matthew Olsson	d5f94aaa7b	LibPDF/PDFViewer: Support rotated pages	2021-05-18 16:35:23 +02:00
Matthew Olsson	f7ea1eb610	Applications: Add a very simple PDFViewer	2021-05-18 16:35:23 +02:00
Matthew Olsson	d6a9b41bac	LibPDF: Parse page crop box and user units	2021-05-18 16:35:23 +02:00
Matthew Olsson	3aeaceb727	LibPDF: Parse nested Page Tree structures We now follow nested page tree nodes to find all of the actual page dicts, whereas previously we just assumed the root level page tree node contained all of the page children directly.	2021-05-10 10:32:39 +02:00
Matthew Olsson	8c745ad0d9	LibPDF: Parse page structures This commit introduces the ability to parse the document catalog dict, as well as the page tree and individual pages. Pages obviously aren't fully parsed, as we won't care about most of the fields until we start actually rendering PDFs. One of the primary benefits of the PDF format is laziness. PDFs are not meant to be parsed all at once, and the same is true for pages. When a Document is constructed, it builds a map of page number to object index, but it does not fetch and parse any of the pages. A page is only parsed when a caller requests that particular page (and is cached going forwards). Additionally, this commit also adds an object_cast function which logs bad casts if DEBUG_PDF is set. Additionally, utility functions were added to ArrayObject and DictObject to get all types of objects from the collections to avoid having to manually cast.	2021-05-10 10:32:39 +02:00

1 2

51 commits