0ct0pu5/ladybird

Author	SHA1	Message	Date
Sam Atkins	d6075ef5b5	LibTextCodec+Everywhere: Make TextCodec::decoder_for() take a StringView We don't need a full String/DeprecatedString inside this function, so we might as well not force users to create one.	2023-02-15 12:48:26 -05:00
Julian Offenhäuser	96064ec5af	LibPDF: Allow filter DecodeParms array entries to be null Filters will use the default values in this case.	2023-02-12 10:55:37 +00:00
Rodrigo Tobar	a533ea7ae6	LibPDF: Improve stream parsing When parsing streams we rely on a /Length item being defined in the stream's dictionary to know how much data comprises the stream. Its value is usually a direct value, but it can be indirect. There was however a contradiction in the code: the condition that allowed it to read and use the /Length value required it to be a direct value, but the actual code using the value would have worked with indirect ones. This meant that indirect /Length values triggered the fallback, "manual" stream parsing code. On the other hand, this latter code was also buggy, because it relied on the "endstream" keyword to appear on a separate line, which isn't always the case. This commit both fixes the bug in the manual stream parsing scenario, while also allowing for indirect /Length values to be used to parse streams more directly and avoid the manual approach. The main caveat to this second change is that for a brief period of time the Document is not able to resolve references (i.e., before the xref table itself is not parsed). Any parsing happening before that (e..g, the linearization dictionary) must therefore use the manual stream parsing approach.	2023-02-08 19:47:15 +01:00
Timothy Flynn	f3db548a3d	AK+Everywhere: Rename FlyString to DeprecatedFlyString DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.	2023-01-09 23:00:24 +00:00
Julian Offenhäuser	a37f3390dc	LibPDF: Allow numbers to start with whitespace	2023-01-09 22:54:36 +00:00
Rodrigo Tobar	d9718064d1	LibPDF: Add support for multi-line comments The code parsing comments parsed only a single line of comments, but callers assumed they parsed all comments that appeared contiguously in a block. The latter is an easier to understand API, so this commit changes the parse_comment function to parse entire blocks of comments instead of single lines.	2022-12-16 10:04:23 +01:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Rodrigo Tobar	e776048309	LibPDF: Ignore whitespace on hex strings The spec says that whitespaces should be ignored, but we weren't. PDFs with whitespaces in their hex strings were thus crushing the parser.	2022-11-30 14:51:14 +01:00
Julian Offenhäuser	0bc3333740	LibPDF: Parse integer numbers with atoi() instead of strtof() strtof() produces rounding errors for very large numbers, which we don't want for integers, as they may have to be precise.	2022-11-19 15:42:08 +01:00
Julian Offenhäuser	c2ad29c85f	LibPDF: Implement png predictor decoding for flate filter For flate and lzw filters, the data can be transformed by this predictor function to make it compress better. For us this means that we have to undo this step in order to get the right result. Although this feature is meant for images, I found at least a few documents that use it all over the place, making this step very important.	2022-11-19 15:42:08 +01:00
Julian Offenhäuser	16ed407c01	LibPDF: Support cascading stream filters You can specify multiple filters as an array, where each one is fed the output of the one before it.	2022-11-19 15:42:08 +01:00
Julian Offenhäuser	becd648a78	LibPDF: Parse hexadecimal values in name objects correctly	2022-11-19 15:42:08 +01:00
Julian Offenhäuser	2f71e0f09a	LibPDF: Allow text operator sequences to start with whitespace	2022-10-16 17:44:54 +02:00
Julian Offenhäuser	633e1632d0	LibPDF: Allow whitespace other than EOL after an object marker	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	65e83bed53	LibPDF: Disallow parsing indirect values as operands An operation like 0 0 0 RG would have been confused for [ 0, 0 0 R ] G	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	9f4659cc63	LibPDF: Move consume and match helper functions to the Reader class	2022-09-17 10:07:14 +01:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Matthew Olsson	468ceb1b48	LibPDF: Rename Command to Operator This is the correct name, according to the spec	2022-03-31 18:10:45 +02:00
Matthew Olsson	4e81663b31	LibPDF: Attempt to unecrypt strings and streams	2022-03-29 02:52:57 +02:00
Matthew Olsson	60c3e786be	LibPDF: Require Document* in Parser constructor This makes it a bit easier to avoid calling parser->set_document, an issue which cost me ~30 minutes to find.	2022-03-29 02:52:57 +02:00
Matthew Olsson	a8de9cf541	LibPDF: Keep track of the current object index/generation while Parsing This information is required to decrypt encrypted strings/streams.	2022-03-29 02:52:57 +02:00
Matthew Olsson	c98bda8ce6	LibPDF: Get rid of PlainText/Encoded StreamObject This was a small optimization to allow a stream object to simply hold a reference to the bytes in a PDF document rather than duplicating them. However, as we move into features such as encryption, this optimization does more harm than good. This can be revisited in the future if necessary.	2022-03-29 02:52:57 +02:00
Matthew Olsson	6133acb8c0	LibPDF: Allow newlines between xref table and "trailer" keyword	2022-03-07 10:53:57 +01:00
Matthew Olsson	544e44eec1	LibPDF: Fix bad hex string parsing logic	2022-03-07 10:53:57 +01:00
Matthew Olsson	3cfecc3d3b	LibPDF: Remove useless hex string substring call	2022-03-07 10:53:57 +01:00
Matthew Olsson	73cf8205b4	LibPDF: Propagate errors in Parser and Document	2022-03-07 10:53:57 +01:00
Matthew Olsson	c1aa8c4a44	LibPDF: Remove unused function in Parser	2022-03-07 10:53:57 +01:00
Sam Atkins	fa3c61cf5a	LibPDF: Make Filter::decode() return ErrorOr	2022-01-24 22:36:09 +01:00
Sam Atkins	45cf40653a	Everywhere: Convert ByteBuffer factory methods from Optional -> ErrorOr Apologies for the enormous commit, but I don't see a way to split this up nicely. In the vast majority of cases it's a simple change. A few extra places can use TRY instead of manual error checking though. :^)	2022-01-24 22:36:09 +01:00
Simon Woertz	c857b5d22f	LibPDF: Convert `PDF::Parser::m_document` from `RefPtr` to `WeakPtr` Otherwise both `PDF::Document` and `PDF::Parser` have a `RefPtr` pointing to each other which leads to a memory leak due to a circular dependency.	2022-01-08 18:57:55 +01:00
Andreas Kling	216e21a1fa	AK: Convert AK::Format formatting helpers to returning ErrorOr<void> This isn't a complete conversion to ErrorOr<void>, but a good chunk. The end goal here is to propagate buffer allocation failures to the caller, and allow the use of TRY() with formatting functions.	2021-11-17 00:21:13 +01:00
Simon Woertz	b87ab989a3	LibPDF: Check if there is data left before consuming Add a check to `Parser::consume_eol` to ensure that there is more data to read before actually consuming any data. Not checking if there is data left leads to failing an assertion in case of e.g., a truncated pdf file.	2021-11-16 00:16:57 +01:00
Andreas Kling	80d4e830a0	Everywhere: Pass AK::ReadonlyBytes by value	2021-11-11 01:27:46 +01:00
Andreas Kling	a15ed8743d	AK: Make ByteBuffer::try_* functions return ErrorOr<void> Same as Vector, ByteBuffer now also signals allocation failure by returning an ENOMEM Error instead of a bool, allowing us to use the TRY() and MUST() patterns.	2021-11-10 21:58:58 +01:00
Brendan Coles	6ccfa3e75e	LibPDF: Parser::parse_header() return false if remaining bytes is zero	2021-10-30 17:34:56 +02:00
Ben Wiederhake	f84a7e2e22	LibPDF: Replace Value class by AK::Variant This decreases the memory consumption by LibPDF by 4 bytes per Value, compensating exactly for the increase in an earlier commit. :^)	2021-09-20 17:39:36 +04:30
Ben Wiederhake	d344253b08	LibPDF: Extract reference bitpacking into dedicated class	2021-09-20 17:39:36 +04:30
Ben Wiederhake	da170997d5	LibPDF: Move inline function definition This breaks the dependency cycle between Parser and Document.	2021-09-20 17:39:36 +04:30
Ali Mohammad Pur	97e97bccab	Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe	2021-09-06 01:53:26 +02:00
Ali Mohammad Pur	3a9f00c59b	Everywhere: Use OOM-safe ByteBuffer APIs where possible If we can easily communicate failure, let's avoid asserting and report failure instead.	2021-09-06 01:53:26 +02:00
Hendiadyoin1	ed46d52252	Everywhere: Use AK/Math.h if applicable AK's version should see better inlining behaviors, than the LibM one. We avoid mixed usage for now though. Also clean up some stale math includes and improper floatingpoint usage.	2021-07-19 16:34:21 +04:30
Wesley Moret	1b8f73b6b3	LibPDF: Fix treating not finding the linearized dict as a fatal error We now try to parse the first indirect value and see if it's the `Linearization Parameter Dictionary`. if it's not, we fallback to reading the xref table from the end of the document	2021-07-16 20:44:10 +02:00
Wesley Moret	5d4d70355e	LibPDF: Fix checking `minor_ver` instead of `major_ver`	2021-07-16 20:44:10 +02:00
Matthew Olsson	612b183703	LibPDF: Convert to east-const to comply with the recent style changes	2021-06-12 22:45:01 +04:30
Matthew Olsson	ea3abb14fe	LibPDF: Parse hint tables This code isn't _actually_ used as of right now, but I wrote it at the same time as all of the code in the previous commit. I realized after I wrote it that these hint tables aren't super useful if the parser already has access to the full file. However, this will be useful if we ever want to stream PDFs from the web (and possibly view them in the browser).	2021-06-12 22:45:01 +04:30
Matthew Olsson	e23bfd7252	LibPDF: Parse linearized PDF files This is a big step, as most PDFs which are downloaded online will be linearized. Pretty much the only difference is that the xref structure is slightly different.	2021-06-12 22:45:01 +04:30
Matthew Olsson	be1be47613	LibPDF: Fix two parser bugs - A newline was assumed to follow the "stream" keyword, when it can also be a windows-style line break - Fix not consuming the "endobj" at the end of every indirect object	2021-06-12 22:45:01 +04:30

1 2

64 commits