Commit graph

439 commits

Author SHA1 Message Date
Rodrigo Tobar
0e1c858f90 LibPDF: Move casting code to its own cast_to function
This functionality was previously part of the resolve_to() Document
method, and thus only available only when resolving objects through the
Document class. There are many use cases where this casting can be used,
but no resolution is needed.

This commit moves this functionality into a new cast_to function, and
makes the resolve_to function call it internally. With this new function
in place we can now offer new versions of DictObject::get_* and
ArrayObject::get_*_at that don't perform Document resolution
unnecessarily when not required.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
f510b2b180 LibPDF: Support null destination parameters
Destination arrays contain a page number, a mode name, and parameters
specific to that mode. In many cases these parameters can be set to
"null", which our code wasn't taking into consideration.

This commit parses these parameters taking into account whether they are
null or actual numbers, and stores them as Optional<float> instead of
plain floats. The parameters are not yet used anywhere else other than
when formatting a Destination object, so the change is fairly small.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
2485c500a3 LibPDF: Fix Destination formatting
This was not correctly written, and thus printed confusing output.
2023-01-06 18:06:41 +01:00
MacDue
eeb6072f15 LibGfx+LibPDF: Apply subpixel offset in affine transformation 2023-01-05 13:50:26 +01:00
MacDue
91db49f7b3 LibPDF: Use subpixel accurate text rendering
This just enables the new tricks from LibGfx with the same nice
improvements :^)
2023-01-05 12:09:35 +01:00
Simon Danner
5fa8068580 LibPDF: Fix calculation of encryption key
Before this patch, the generation of the encryption key was not working
correctly since the lifetime of the underlying data was too short,
same inputs would give random encryption keys.

Fixes #16668
2023-01-04 11:10:37 -05:00
Ben Wiederhake
c2a900b853 Everywhere: Remove unused includes of AK/StdLibExtras.h
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:

\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b

(Without the linebreaks.)

This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
Ben Wiederhake
b83cb09db1 Everywhere: Fix badly-formatted includes
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
2023-01-02 11:06:15 -05:00
Andreas Kling
f982400063 LibGfx: Rename TTF/TrueType to OpenType
OpenType is the backwards-compatible successor to TrueType, and the
format we're actually parsing in LibGfx. So let's call it that.
2022-12-21 08:44:22 +01:00
Rodrigo Tobar
bb48a67f84 LibPDF: Reset encryption key on failed user password attempt
When an attempt is made to provide the user password to a
SecurityHandler a user gets back a boolean result indicating success or
failure on the attempt. However, the SecurityHandler is left in a state
where it thinks it has a user password, regardless of the outcome of the
attempt. This confuses the rest of the system, which continues as if the
provided password is correct, resulting in garbled content.

This commit fixes the situation by resetting the internal fields holding
the encryption key (which is used to determine whether a user password
has been successfully provided) in case of a failed attempt.
2022-12-20 10:28:58 +01:00
Rodrigo Tobar
dc6a11cf6b LibPDF: Treat Encyption's Length item as optional
With the StandardSecurityHandler the Length item in the Encryption
dictionary is optional, and needs to be given only if the encryption
algorithm (V) is other than 1; otherwise we can assume a length of 40
bits for the encryption key.
2022-12-20 10:28:58 +01:00
Rodrigo Tobar
6df9aa8f2c LibPDF: Store page number, not Value, in OutlineItem
The Value previously stored corresponded to a Reference to a Page object
in the PDF document. This isn't useful information, since what we want
to display at the end of the day is the page an outline item refers to.

This commit changes the page member on OutlineItem to be a Optional<u32>
(some destinations don't necessarily refer to a Page), which we resolve
while building OutlineItems.
2022-12-17 19:40:52 +01:00
Rodrigo Tobar
3db6af6360 LibPDF: Keep track of OutlineItem parents
While OutlineItem had a parent field, it was never populated nor used.
This commit populates it when possible (no parent means the OutlineItem
is a top-level item).
2022-12-17 19:40:52 +01:00
Rodrigo Tobar
c4bc27f274 LibPDF: Don't abort on unsupported drawing operations
Instead of calling TODO(), which will abort the program, we now return
an Error specifying that we haven't implemented the drawing operation
yet. This will now nicely trickle up all the way through to the
PDFViewer, which will then notify its clients about the problem.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
e87fecf710 LibPDF: Switch to best-effort PDF rendering
The current rendering routine aborts as soon as an error is found during
rendering, which potentially severely limits the contents we show on
screen. Moreover, whenever an error happens the PDFViewer widget shows
an error dialog, and doesn't display the bitmap that has been painted so
far.

This commit improves the situation in both fronts, implementing
rendering now with a best-effort approach. Firstly, execution of
operations isn't halted after an operand results in an error, but
instead execution of all operations is always attempted, and all
collected errors are returned in bulk. Secondly, PDFViewer now always
displays the resulting bitmap, regardless of error being produced or
not. To communicate errors, an on_render_errors callback has been added
so clients can subscribe to these events and handle them as appropriate.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
96fb4b20f1 LibPDF: Add Errors class that accumulate multiple errors
This will be used to perform a best-effort rendering, where an error in
rendering won't abort the whole rendering operation, but instead will be
stored for later reference while rendering continues.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
d9718064d1 LibPDF: Add support for multi-line comments
The code parsing comments parsed only a single line of comments, but
callers assumed they parsed all comments that appeared contiguously in a
block. The latter is an easier to understand API, so this commit changes
the parse_comment function to parse entire blocks of comments instead of
single lines.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
a1af79dca6 LibPDF: Follow a FontFile's Length values
These can be references (at least from what I've found in some
documents), so we want to resolve them before using them.
2022-12-16 01:24:43 -07:00
Rodrigo Tobar
cb1a7cc721 LibPDF: Simplify outline construction
While the Outline Items making up the document's Outline have all sorts
of cross-references (parent, first/last chlid, next/previous sibling,
etc), not all documents out there have fully-consistent references. Our
implementation already discarded some of that information too (e.g.,
/Parent and /Prev were never read), and trusted that /First and /Next
were good enough to traverse the whole hierarchy.

Where the current implementation failed was in assuming that /Last was
also a good source of information. There are documents out there were
/Last also points to dead ends, and were therefore causing a crash when
we verified that the last child found on a chain was the /Last child
declared by the parent. To fix this I'm simply removing the check, and
simplifying the function call to remove any references to /Last. This
way we affirm our commitment to /First and /Next as the main sources of
information.
2022-12-16 01:24:43 -07:00
Rodrigo Tobar
41bd304a7f LibPDF: Ignore seac PS1 commands for now
This command is meant to print an Standard Encoding Accented Character.
It's not critical to implement it yet, but if we want to render more
documents we need to handle the instruction, even if simply ignore it.
2022-12-16 01:24:43 -07:00
Ali Mohammad Pur
f96a3c002a Everywhere: Stop shoving things into ::std and mentioning them as such
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.

std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
2022-12-14 11:44:32 +01:00
Rodrigo Tobar
adc45635e9 LibPDF: Add initial image display support
After adding support for XObject Form rendering, the next was to display
XObject images. This commit adds this initial support,

Images come in many shapes and forms: encodings: color spaces, bits per
component, width, height, etc. This initial support is constrained to
the color spaces we currently support, to images that use 8 bits per
component, to images that do *not* use the JPXDecode filter, and that
are not Masks. There are surely other constraints that aren't considered
in this initial support, so expect breakage here and there.

In addition to supporting images, we also support applying an alpha mask
(SMask) on them. Additionally, a new rendering preference allows to skip
image loading and rendering altogether, instead showing an empty
rectangle as a placeholder (useful for when actual images are not
supported). Since RenderingPreferences is becoming a bit more complex,
we add a hash option that will allow us to keep track of different
preferences (e.g., in a HashMap).
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
2331fe5e68 LibPDF: Add first interpolation methods
Interpolation is needed in more than one place, and I couldn't find a
central place where I could borrow a readily available interpolation
routine, so I've implemented the first simple interpolation object. More
will follow for more complex scenarios.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
17676705a5 LibPDF: Add facility to obtain Vector<float> from ArrayObject
Arrays of float numbers are common in many PDF objects, and thus to
avoid code repetition I'm introducing a new method to ArrayObject that
will return exactly that.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
a63b93f724 LibPDF: Add new Error::Type for unsupported rendering features 2022-12-10 10:49:03 +01:00
Rodrigo Tobar
26f8c0b76c LibPDF: Add more knowledge to ColorSpaces classes
ColorSpaces now can tell users how many components they expect, and the
default decode array that should be used when converting unit bit
sequences into color space component input values during image
rendering.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
ba16310739 LibPDF: Refactor parsing of ColorSpaces
ColorSpaces can be specified in two ways: with a stream as operands of
the color space operations (CS/cs), or as a separate PDF object, which
is then referred to by other means (e.g., from Image XObjects and other
entities). These two modes of addressing a ColorSpace are slightly
different and need to be addressed separately. However, the current
implementation embedded the full logic of the first case in the routine
that created ColorSpace objects.

This commit refactors the creation of ColorSpace to support both cases.
First, a new ColorSpaceFamily class encapsulates the static aspects of a
family, like its name or whether color space construction never requires
parameters. Then we define the supported ColorSpaceFamily objects.

On top of this also sit a breakage on how ColorSpaces are created. Two
methods are now offered: one only providing construction of no-argument
color spaces (and thus taking a simple name), and another taking an
ArrayObject, hence used to create ColorSpaces requiring arguments.

Finally, on top of *that* two ways to get a color space in the Renderer
are made available: the first creates a ColorSpace with a name and a
Resources dictionary, and another takes an Object. These model the two
addressing modes described above.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
287bb0feac LibPDF: Return results directly and avoid unpacking+packing 2022-12-10 10:49:03 +01:00
Andreas Kling
d6a3be1615 LibPDF: Add missing character quirk for WinAnsiEncoding fonts
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.

This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.

I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.

This takes care of a FIXME in the Annex D part of the PDF specification.
2022-12-08 09:54:20 +01:00
MacDue
7be0b27dd3 Meta+Userland: Pass Gfx::IntPoint by value
This is just two ints or 8 bytes or the size of the reference on
x86_64 or AArch64.
2022-12-07 11:48:27 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Rodrigo Tobar
cb3e05f476 LibPDF: Add initial implementation of XObject rendering
This implementation currently handles Form XObjects only, skipping
image XObjects. When rendering an XObject, its resources are passed to
the underlying operations so they use those instead of the Page's.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
b3007c17bd LibPDF: Allow operators to receive optional resources
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.

This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e58165ed7a LibPDF: Render cubic bezier curves
The implementation of bezier curves already exists on Gfx, so
implementing the PDF rendering of this command is trivial.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
fe5c823989 LibPDF: Communicate resources to ColorSpace, not Page
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
164422f8d8 LibPDF: Add further common names 2022-11-30 14:51:14 +01:00
Rodrigo Tobar
5277ad1d6d LibPDF: Implement Run Length Decoding
This is a simple decoding process that is needed by some streams.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e776048309 LibPDF: Ignore whitespace on hex strings
The spec says that whitespaces should be ignored, but we weren't. PDFs
with whitespaces in their hex strings were thus crushing the parser.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
d04613d252 LibPDF: Fix path coordinates calculation
Paths rendering was buggy because the map() function that translates
points from user space to bitmap space applied the vertical flip
conversion that the current transformation matrix already considers;
Hence, all paths were upside down. The only exception was the "re"
instruction, which manually adjusted the Y coordinate of its points to
be flipped again (and had a FIXME saying that this should be
unnecessary).

This commit fixes the map() function that maps userspace points to
bitmap coordinates. The "re" operator implementation has also been
simplified creating a rectangle first and mapping *that* instead of
mapping each point individually.
2022-11-26 08:56:35 +01:00
Rodrigo Tobar
e92ec26771 LibPDF: Introduce rendering preferences and show clipping paths
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.
2022-11-25 23:03:24 +01:00
Rodrigo Tobar
a1e36e8f78 LibPDF: Improve path clipping support
The existing path clipping support was broken, as it performed the
clipping operation as soon as the path clipping commands (W/W*) were
received. The correct behavior is to keep a clipping path in the
graphic state, *intersect* that with the current path upon receiving
W/W*, and apply the clipping when performing painting operations. On top
of that, the intersection happening at W/W* time does not affect the
painting operation happening on the current on-build path, but takes
effect only after the current path is cleared; therefore a current and a
next clipping path need to be kept track of.

Path clipping is not yet supported on the Painter class, nor is path
intersection. We thus continue using the same simplified bounding box
approach to calculate clipping paths.

Since now we are dealing with more rectangles-as-path code, I've made
helper functions to build a rectangle path and reuse it as needed.
2022-11-25 23:03:24 +01:00
Julian Offenhäuser
d1bc89e30b LibPDF: Try to repair XRef tables with broken indices
An XRef table usually starts with an object number of zero. While it
could technically start at any other number, this is a tell-tale sign
of a broken table.

For the "broken" documents I encountered, this always meant that some
objects must have been removed from the start of the table, without
updating the following indices. When this is the case, the document is
not able to be read normally.

However, most other PDF parsers seem to know of this quirk and fix the
XRef table automatically.

Likewise, we now check for this exact case, and if it matches up with
what we expect, we update the XRef table such that all object numbers
match the actual objects found in the file again.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e06a065594 LibPDF: Override Type 1 character mappings by encoding in font dict
If the font dictionary includes an "Encoding" entry, it will be used
instead of the PS1FontProgram's built-in encoding.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
65ff80e8a5 LibPDF: Add alternative names to is_standard_latin_font() helper 2022-11-25 22:44:47 +01:00
Julian Offenhäuser
9cb3b23377 LibPDF: Move all font handling to Type1Font and TrueTypeFont classes
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.

Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.

This means that we can now render embedded TrueType fonts as well :^)

It also makes the renderer's job much more simple and leads to a much
cleaner API design.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e748a94f80 LibPDF: Introduce loading of common font data in PDFFont base class
This font data is shared between Type 1 and TrueType fonts, which is
why we can now load it in the base class that they both use.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
dd82a026f8 LibPDF: Pass PDFFont::draw_glyph() a char code instead of a code point
We would previously pass this function a unicode code point, which is
not actually what we want here.

Instead, we want the "raw" code point, with the font itself deciding
whether or not it needs to be re-mapped.

This same mistake in terminology applied to PS1FontProgram.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
8532ca1b57 LibPDF: Convert dash pattern array elements to integers if necessary
They may be floats instead.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
0bc3333740 LibPDF: Parse integer numbers with atoi() instead of strtof()
strtof() produces rounding errors for very large numbers, which we
don't want for integers, as they may have to be precise.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
c2ad29c85f LibPDF: Implement png predictor decoding for flate filter
For flate and lzw filters, the data can be transformed by this
predictor function to make it compress better. For us this means that
we have to undo this step in order to get the right result.

Although this feature is meant for images, I found at least a few
documents that use it all over the place, making this step very
important.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4bd79338e8 LibPDF: Fix off-by-one error in Reader::remaining() 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4b1a72ff7a LibPDF: Fix loop condition in parse_xref_stream()
We previously compared two unrelated values to determine if we parsed
the xref table to completion. We now check if we added every subsection
instead, and double check to make sure we never read past the end.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
a17a23a3f0 LibPDF: Make some variable names in parse_xref_stream() more clear
I found these to be a bit misleading.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
f926dfe36b LibPDF: Implement the DCT filter
This filter basically tells us that we are dealing with a JPEG.
Note that by serializing the resulting image we assume that this filter
is the last one in the chain, everything else would be highly unlikely.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
baaf42360e LibPDF: Derive alternate ICC color space from the number of components
We currently don't support ICC color spaces and fall back to a "simple"
one instead.

If no alternative is specified however, we are allowed to pick the
closest match based on the number of color components.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
16ed407c01 LibPDF: Support cascading stream filters
You can specify multiple filters as an array, where each one is fed the
output of the one before it.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
becd648a78 LibPDF: Parse hexadecimal values in name objects correctly 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
7c4f5b58be LibPDF: Use Gfx::PathRasterizer for Adobe Type 1 font rendering
This gives much better visual results than painting the path directly.
It also has the nice side effect that Type 1 fonts will now look much
more similar to TrueType fonts, which use the same class :^)

In addition, we can now cache glyph bitmaps for repeated use.
2022-11-19 11:04:34 +01:00
Tim Schumacher
ce2f1b845f Everywhere: Mark dependencies of most targets as PRIVATE
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.

Also included are changes to readd now missing dependencies to tools
that actually need them.
2022-11-01 14:49:09 +00:00
Tim Schumacher
7834e26ddb Everywhere: Explicitly link all binaries against the LibC target
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.

This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.

Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
2022-11-01 14:49:09 +00:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
e6f29302a7 LibPDF: Add glyph drawing and type info methods to PDFFont
A PDFFont can now be asked for its specific type and whether it is part
of the standard 14 fonts. It now also contains a method to draw a
glyph, which is stubbed-out for now.

This will be useful for the renderer to take into consideration when
drawing text, since we don't include replacements for the standard set
of fonts yet, but still want to make use of embedded fonts when
available.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
36f83cecab LibPDF: Allow page objects to inherit the MediaBox and Resources entries 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
2f71e0f09a LibPDF: Allow text operator sequences to start with whitespace 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
7ecd420b03 LibPDF: Parse floating point numbers that omit a leading zero correctly 2022-10-16 17:44:54 +02:00
Ben Wiederhake
a99cd09891 Libraries: Add missing includes, add namespace qualifiers
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:

    // file: compile_me.cpp
    #include <LibDNS/Question.h>
    // That's it, this was enough to cause a compilation error.

Likewise for most other files touched by this commit.
2022-09-18 13:27:24 -04:00
Julian Offenhäuser
77f5f7a6f4 LibPDF: Support parsing page tree nodes that are in object streams
conditionally_parse_page_tree_node used to assume that the xref table
contained a byte offset, even for compressed objects. It now uses the
common facilities for parsing objects, at the expense of some
performance.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
6225c03256 LibPDF: Rename argument for the latin character set enumeration macro
The previous name "V" collided with one of the entries.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
633e1632d0 LibPDF: Allow whitespace other than EOL after an object marker 2022-09-17 10:07:14 +01:00
Julian Offenhäuser
65e83bed53 LibPDF: Disallow parsing indirect values as operands
An operation like 0 0 0 RG would have been confused for [ 0, 0 0 R ] G
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
04cb00dc9a LibPDF: Fix handling of differences array in custom encodings
When looking up differences in the specified encoding, we previously
didn't recognize a lot of characters, namely those that are referred to
by a string in the PDF itself, like "/germandbls".

We now create a mapping of those characters to the code points they are
referring to, and correctly look them up when needed.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
36828f1385 LibPDF: Don't expect glyph width arrays to contain integers
They might also contain floats, in which case we convert them to int
before use.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
97ed4106e5 LibPDF: Fix text positioning with operator TJ
As per spec, the positioning (or kerning) parameter of this operator
should translate the text matrix before the next showing of text.
Previously, this calculation was slightly wrong and also only applied
after the text was already shown.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
563d91b6c4 LibPDF: Implement loading compressed objects from object streams
Now, whenever the xref table points to a compressed object,
parse_object_with_index will look it up in the corresponding object
stream as if it were a regular object.

With this, our parser gains the bare minimum support for xref streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
f9beff7b5e LibPDF: Initial work on parsing xref streams
Since PDF version 1.5, a document may omit the xref table in favor of
a new kind of xref stream object. This is used to reference so-called
"compressed" objects that are part of an object stream.

With this patch we are able to parse this new kind of xref object, but
we'll have to implement object streams to use them correctly.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
4887aacec7 LibPDF: Move document-specific parsing functionality into its own class
The Parser class is now a generic PDF object parser, of which the new
DocumentParser class derives. DocumentParser now takes over all
functions relating to linearization, pages, xref and trailer handling.

This allows the use of multiple parsers in the same document's
context, which will be needed in order to handle PDF object streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
9f4659cc63 LibPDF: Move consume and match helper functions to the Reader class 2022-09-17 10:07:14 +01:00
Ben Wiederhake
ff96747e17 LibPDF: Break inclusion cycle by removing unnecessary include 2022-09-17 04:00:54 +00:00
sin-ack
c8585b77d2 Everywhere: Replace single-char StringView op. arguments with chars
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
7456904a39 Meta+Userland: Simplify some formatters
These are mostly minor mistakes I've encountered while working on the
removal of StringView(char const*). The usage of builder.put_string over
Format<FormatString>::format is preferrable as it will avoid the
indirection altogether when there's no formatting to be done. Similarly,
there is no need to do format(builder, "{}", number) when
builder.put_u64(number) works equally well.

Additionally a few Strings where only constant strings were used are
replaced with StringViews.
2022-07-12 23:11:35 +02:00
Linus Groh
173dcfb7cb Everywhere: Fix a bunch of typos 2022-05-29 15:22:00 +02:00
Simon Wanner
5136c5ae1a LibGfx: Move ScaledFont and new base class VectorFont out of TTF 2022-04-09 23:48:18 +02:00
Simon Wanner
206d6ece55 LibGfx: Move other font-related files to LibGfx/Font/ 2022-04-09 23:48:18 +02:00
Simon Wanner
6f8fd91f22 LibGfx: Move TTF files from TrueTypeFont/ to Font/TrueType/ 2022-04-09 23:48:18 +02:00
Matthew Olsson
3ecb41b7d9 PDFViewer: Support a continuous page view mode 2022-04-04 14:59:37 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Matthew Olsson
b69488031b LibPDF: Fix mismatched class/struct declaration 2022-03-31 18:10:45 +02:00
Matthew Olsson
468ceb1b48 LibPDF: Rename Command to Operator
This is the correct name, according to the spec
2022-03-31 18:10:45 +02:00
Matthew Olsson
49cb040c27 LibPDF: Fix some base-encoding-related crashes 2022-03-31 18:10:45 +02:00
Matthew Olsson
4d0f74a15c LibPDF: Add Type0 and TrueType fonts 2022-03-31 18:10:45 +02:00
Matthew Olsson
e831c374f4 LibPDF: Abstract Type1 font data
TTF font types will use the same data
2022-03-31 18:10:45 +02:00
Matthew Olsson
058cf5f7f7 LibPDF: Accept font size in PDFFont::get_char_width
This will be required for TTF fonts
2022-03-31 18:10:45 +02:00
Matthew Olsson
5f9d35909d LibPDF: Move font files into their own directory 2022-03-31 18:10:45 +02:00
Matthew Olsson
d2771eafc5 LibPDF: Use Font /Widths array to derive character widths
This makes the spacing between chars _much_ better!
2022-03-31 18:10:45 +02:00
Matthew Olsson
130846f337 LibPDF: Use AntiAliasingPainter in Renderer when possible 2022-03-31 18:10:45 +02:00
Matthew Olsson
8224ca6150 LibPDF: Fix more bad Renderer text positioning calculations 2022-03-31 18:10:45 +02:00
Matthew Olsson
34efc668d2 LibPDF: Handle SCN and scn operators 2022-03-31 18:10:45 +02:00
Matthew Olsson
e1115cfe48 LibPDF: Add basic ICCBased color space handling 2022-03-31 18:10:45 +02:00
Matthew Olsson
1238e65d30 LibPDF: Move color space creation from Renderer to ColorSpace 2022-03-31 18:10:45 +02:00
Matthew Olsson
4e81663b31 LibPDF: Attempt to unecrypt strings and streams 2022-03-29 02:52:57 +02:00
Matthew Olsson
60c3e786be LibPDF: Require Document* in Parser constructor
This makes it a bit easier to avoid calling parser->set_document, an
issue which cost me ~30 minutes to find.
2022-03-29 02:52:57 +02:00
Matthew Olsson
a8de9cf541 LibPDF: Keep track of the current object index/generation while Parsing
This information is required to decrypt encrypted strings/streams.
2022-03-29 02:52:57 +02:00
Matthew Olsson
5b316462b2 LibPDF: Add implementation of the Standard security handler
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
2022-03-29 02:52:57 +02:00
Matthew Olsson
c98bda8ce6 LibPDF: Get rid of PlainText/Encoded StreamObject
This was a small optimization to allow a stream object to simply hold
a reference to the bytes in a PDF document rather than duplicating
them. However, as we move into features such as encryption, this
optimization does more harm than good. This can be revisited in the
future if necessary.
2022-03-29 02:52:57 +02:00
Matthew Olsson
15b7999313 LibPDF: Change CommonNames' enumerator macro parameter name
Apparently "V" is a PDF property. Let's hope "A" isn't!
2022-03-29 02:52:57 +02:00
Matthew Olsson
9a4a3318a9 LibPDF: Store a PDFFont in the Renderer's text state 2022-03-29 02:52:57 +02:00
Matthew Olsson
0624472768 LibPDF: Add initial support for Type1 fonts
This is enough to get a char code -> code point mapping
2022-03-29 02:52:57 +02:00
Matthew Olsson
8441fa2bc4 LibPDF: Add support for builtin and custom Encodings 2022-03-29 02:52:57 +02:00
Matthew Olsson
6133acb8c0 LibPDF: Allow newlines between xref table and "trailer" keyword 2022-03-07 10:53:57 +01:00
Matthew Olsson
4d509ff365 LibPDF: Fix "incorrect" matrix multiplication in Renderer
Incorrect is in quotes because the spec (both 1.7 and 2.0) specify this
multiplication as it was originally! However, flipping the order of
operations here makes the text in all of my test cases render in the
correct position.

The CTM is a transformation matrix between the text coordinate system
and the device coordinate system. However, being on the right-side of
the multiplication means that the CTM scale parameters don't have any
influence on the translation component of the left-side matrix. This
oddity is what originally led to me just trying this change to see if
it worked.
2022-03-07 10:53:57 +01:00
Matthew Olsson
6f1cfcf217 LibPDF: Implement marked renderer operations as nops 2022-03-07 10:53:57 +01:00
Matthew Olsson
544e44eec1 LibPDF: Fix bad hex string parsing logic 2022-03-07 10:53:57 +01:00
Matthew Olsson
3cfecc3d3b LibPDF: Remove useless hex string substring call 2022-03-07 10:53:57 +01:00
Matthew Olsson
e9342183f0 LibPDF: Support all Dest types 2022-03-07 10:53:57 +01:00
Matthew Olsson
b240d23a87 LibPDF: Propagate errors in Renderer/PDFViewer 2022-03-07 10:53:57 +01:00
Matthew Olsson
d82bd885ce LibPDF: Propagate ColorSpace errors 2022-03-07 10:53:57 +01:00
Matthew Olsson
73cf8205b4 LibPDF: Propagate errors in Parser and Document 2022-03-07 10:53:57 +01:00
Matthew Olsson
7e1c823725 LibPDF: Fix the zoom-related text scaling issue
Previously, text spacing on a page would only look correct on very
zoomed-in pages. When the page was zoomed out, the spacing between
characters was very large. The cause for this was incorrect initial
values for the Tc (character spacing) and Tw (word spacing) text
parameters. The initial values were too large, but they were only
about 3-5 pixels, which is why the error was only observable for
smaller pages.

The text placement still isn't perfect, but it is _much_ better!
2022-03-07 10:53:57 +01:00
Matthew Olsson
c1aa8c4a44 LibPDF: Remove unused function in Parser 2022-03-07 10:53:57 +01:00
Sam Atkins
fa3c61cf5a LibPDF: Make Filter::decode() return ErrorOr 2022-01-24 22:36:09 +01:00
Sam Atkins
f590cd1850 AK+Userland: Make AK::decode_hex() return ErrorOr
This lets us propagate the reason why it failed up to the caller. :^)
2022-01-24 22:36:09 +01:00
Sam Atkins
45cf40653a Everywhere: Convert ByteBuffer factory methods from Optional -> ErrorOr
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
2022-01-24 22:36:09 +01:00
Simon Woertz
c857b5d22f LibPDF: Convert PDF::Parser::m_document from RefPtr to WeakPtr
Otherwise both `PDF::Document` and `PDF::Parser` have a `RefPtr`
pointing to each other which leads to a memory leak due to a circular
dependency.
2022-01-08 18:57:55 +01:00
Andreas Kling
216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Simon Woertz
b87ab989a3 LibPDF: Check if there is data left before consuming
Add a check to `Parser::consume_eol` to ensure that there is more data
to read before actually consuming any data. Not checking if there is
data left leads to failing an assertion in case of e.g., a truncated
pdf file.
2021-11-16 00:16:57 +01:00
Ali Mohammad Pur
bf59d9e824 Userland: Include Vector.h in a few places to make HeaderCheck happy
This header was being transitively pulled in, but that no longer happens
after 5f7d008791.
2021-11-11 20:36:36 +01:00
Andreas Kling
80d4e830a0 Everywhere: Pass AK::ReadonlyBytes by value 2021-11-11 01:27:46 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
a15ed8743d AK: Make ByteBuffer::try_* functions return ErrorOr<void>
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
2021-11-10 21:58:58 +01:00
Brendan Coles
6ccfa3e75e LibPDF: Parser::parse_header() return false if remaining bytes is zero 2021-10-30 17:34:56 +02:00
Ben Wiederhake
98a0f9c0bd LibPDF: Rely on default-constructor of Variant 2021-09-21 04:22:52 +04:30
Ben Wiederhake
f84a7e2e22 LibPDF: Replace Value class by AK::Variant
This decreases the memory consumption by LibPDF by 4 bytes per Value,
compensating exactly for the increase in an earlier commit. :^)
2021-09-20 17:39:36 +04:30
Ben Wiederhake
d344253b08 LibPDF: Extract reference bitpacking into dedicated class 2021-09-20 17:39:36 +04:30
Ben Wiederhake
da170997d5 LibPDF: Move inline function definition
This breaks the dependency cycle between Parser and Document.
2021-09-20 17:39:36 +04:30
Ben Wiederhake
edc0cd29f8 LibPDF: Break weird dependency cycle
Old situation:
Object.h defines Object
Object.h defines ArrayObject
ArrayObject requires the definition of Object
ArrayObject requires the definition of Value
Value.h defines Value
Value requires the definition of Object

Therefore, a file with the single line "#include <Value.h>" used to
raise compilation errors; certainly not something that one might expect
from a library.

This patch splits up the definitions in Object.h to break the cycle.
Now, Object.h only defines Object, Value.h still only defines Value (and
includes Object.h), and the new header ObjectDerivatives.h defines
ArrayObject (and includes both Object.h and Value.h).
2021-09-20 17:39:36 +04:30
Ben Wiederhake
7ddd11729d LibPDF: Add missing headers in Value.h 2021-09-20 17:39:36 +04:30
Ben Wiederhake
35674b8a42 LibPDF: Fix math error in comments 2021-09-20 17:39:36 +04:30
Ben Wiederhake
750bed254f LibPDF: Switch to automatic ref counting, fix memory leak
At least `Value::operator=` didn't properly unref the `PDF::Object` when
it was called. This type of problem is removed by just letting `RefPtr`
do its thing.

This patch increases the memory consumption by LibPDF by 4 bytes (the
other union objects) per value.
2021-09-20 17:39:36 +04:30
Ben Wiederhake
05006e63c4 LibPDF: Add missing headers to XRefTable.h 2021-09-20 17:39:36 +04:30
Ben Wiederhake
6089c4d97d LibPDF: Add missing headers to Reader.h 2021-09-20 17:39:36 +04:30
Ben Wiederhake
6836ca2136 LibPDF: Add missing headers to Forward.h 2021-09-20 17:39:36 +04:30
Brian Gianforcaro
507effce5b LibPDF: Use move to avoid unnecessary ref/unref of network device RefPtr
Flagged by pvs-studio as a potential perf optimization.
2021-09-16 17:17:13 +02:00
Ali Mohammad Pur
97e97bccab Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe 2021-09-06 01:53:26 +02:00
Ali Mohammad Pur
3a9f00c59b Everywhere: Use OOM-safe ByteBuffer APIs where possible
If we can easily communicate failure, let's avoid asserting and report
failure instead.
2021-09-06 01:53:26 +02:00
Daniel Bertalan
d7b6cc6421 Everywhere: Prevent risky implicit casts of (Nonnull)RefPtr
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
  `Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
  in LibIMAP/Client.cpp)

This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
2021-09-03 23:20:23 +02:00
Hendiadyoin1
ed46d52252 Everywhere: Use AK/Math.h if applicable
AK's version should see better inlining behaviors, than the LibM one.
We avoid mixed usage for now though.

Also clean up some stale math includes and improper floatingpoint usage.
2021-07-19 16:34:21 +04:30
Wesley Moret
1b8f73b6b3 LibPDF: Fix treating not finding the linearized dict as a fatal error
We now try to parse the first indirect value and see 
if it's the `Linearization Parameter Dictionary`. if it's not, we 
fallback to reading the xref table from the end of the document
2021-07-16 20:44:10 +02:00
Wesley Moret
5d4d70355e LibPDF: Fix checking minor_ver instead of major_ver 2021-07-16 20:44:10 +02:00
Matthew Olsson
612b183703 LibPDF: Convert to east-const to comply with the recent style changes 2021-06-12 22:45:01 +04:30
Matthew Olsson
0a4d8ef98d LibPDF: Bake the flipped y-axis directly into the CTM matrix 2021-06-12 22:45:01 +04:30
Matthew Olsson
449ef14895 LibPDF: Avoid calculating rendering matrix for every glyph 2021-06-12 22:45:01 +04:30
Matthew Olsson
c142dadbe8 LibPDF: Handle the TJ graphical operator 2021-06-12 22:45:01 +04:30
Matthew Olsson
47531619e3 LibPDF: Handle the gs graphical operator 2021-06-12 22:45:01 +04:30
Matthew Olsson
006f5498de LibPDF: Add support for the CalRGB ColorSpace
This isn't tested all that well, as the PDF I am testing with only uses
it for black (which is trivial). It can be tested further when LibPDF
is able to process more complex PDFs that actually use this color space
non-trivially.
2021-06-12 22:45:01 +04:30
Matthew Olsson
7b4e36bf88 LibPDF: Split ColorSpace into a different class for each color space
While unnecessary at the moment, this will allow for more fine-grained
control when complex color spaces get added.
2021-06-12 22:45:01 +04:30
Matthew Olsson
ea3abb14fe LibPDF: Parse hint tables
This code isn't _actually_ used as of right now, but I wrote it at the
same time as all of the code in the previous commit. I realized after
I wrote it that these hint tables aren't super useful if the parser
already has access to the full file. However, this will be useful if
we ever want to stream PDFs from the web (and possibly view them in
the browser).
2021-06-12 22:45:01 +04:30
Matthew Olsson
e23bfd7252 LibPDF: Parse linearized PDF files
This is a big step, as most PDFs which are downloaded online will be
linearized. Pretty much the only difference is that the xref structure
is slightly different.
2021-06-12 22:45:01 +04:30
Matthew Olsson
be1be47613 LibPDF: Fix two parser bugs
- A newline was assumed to follow the "stream" keyword, when it can also
  be a windows-style line break
- Fix not consuming the "endobj" at the end of every indirect object
2021-06-12 22:45:01 +04:30
Matthew Olsson
78bc9d1539 LibPDF: Refine the distinction between the Document and Parser
The Parser should hold information relevant for parsing, whereas the
Document should hold information relevant for displaying pages.
With this in mind, there is no reason for the Document to hold the
xref table and trailer. These objects have been moved to the Parser,
which allows the Parser to expose less public methods (which will be
even more evident once linearized PDFs are supported).
2021-06-12 22:45:01 +04:30
Matthew Olsson
cafd7c11b4 LibPDF: Account for inverted y axis when rendering text 2021-06-12 22:45:01 +04:30
Matthew Olsson
1ef5071d1b LibPDF: Harden the document/parser against errors 2021-06-12 22:45:01 +04:30
Matthew Olsson
d654fe0e41 LibPDF: Differentiate Value's null and empty states 2021-06-12 22:45:01 +04:30
Ali Mohammad Pur
51c2c69357 AK+Everywhere: Disallow constructing Functions from incompatible types
Previously, AK::Function would accept _any_ callable type, and try to
call it when called, first with the given set of arguments, then with
zero arguments, and if all of those failed, it would simply not call the
function and **return a value-constructed Out type**.
This lead to many, many, many hard to debug situations when someone
forgot a `const` in their lambda argument types, and many cases of
people taking zero arguments in their lambdas to ignore them.
This commit reworks the Function interface to not include any such
surprising behaviour, if your function instance is not callable with
the declared argument set of the Function, it can simply not be
assigned to that Function instance, end of story.
2021-06-06 00:27:30 +04:30
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Matthew Olsson
78f3bad7e6 LibPDF: Pre-initialize common FlyStrings in CommonNames.h 2021-05-25 00:24:09 +04:30
Matthew Olsson
67b65dffa8 LibPDF: Handle string encodings
Strings can be encoded in either UTF16-BE or UTF8. In either case,
there are a few initial bytes which specify the encoding that must
be checked and also removed from the final string.
2021-05-25 00:24:09 +04:30
Matthew Olsson
a08922d2f6 LibPDF: Parse outline structures 2021-05-25 00:24:09 +04:30
Matthew Olsson
be6e4b6f3c LibPDF: Store indirect value refs in Value objects
IndirectValueRef is so simple that it can be stored directly in the
Value class instead of being heap allocated.

As the comment in Value says, however, in theory the max bits needed to
store is 48 (16 for the generation index and 32(?) for the object
index), but 32 should be good enough for now. We can increase it to u64
later if necessary.
2021-05-25 00:24:09 +04:30
Matthew Olsson
534a2e95d2 LibPDF: Add basic color space support to the renderer
This commit only supports the three most basic color spaces:
DeviceGray, DeviceRGB, and DeviceCMYK
2021-05-25 00:24:09 +04:30
Matthew Olsson
f2d2f3fae7 LibPDF: Add a very poor path clipping implementation
This completely ignores the actual path and just uses its bounding box,
since our painter doesn't support clipping to paths.
2021-05-25 00:24:09 +04:30
Matthew Olsson
bf96ad674c LibPDF: Implement stubs for all graphical commands 2021-05-25 00:24:09 +04:30
Matthew Olsson
477e3946e5 LibPDF: Add support for stream filters
This commit also splits up StreamObject into PlainTextStreamObject and
EncodedStreamObject, which is essentially just a stream object which
does not own its bytes vs one which does.
2021-05-25 00:24:09 +04:30
Matthew Olsson
97cc482087 LibPDF: Make Reader::dump_state a bit more readable 2021-05-25 00:24:09 +04:30
Matthew Olsson
8c7ebc7a3f LibPDF: Do not assume value is an object in parse_indirect_value 2021-05-25 00:24:09 +04:30
Matthew Olsson
d5f94aaa7b LibPDF/PDFViewer: Support rotated pages 2021-05-18 16:35:23 +02:00
Matthew Olsson
f7ea1eb610 Applications: Add a very simple PDFViewer 2021-05-18 16:35:23 +02:00
Matthew Olsson
309105678b LibPDF: Fix bad resolve_to calls in Document
We can't call resolve_to with IndirectValue{,Ref}, since the values
will obviously be resolved, and will not give us the object of the
correct type.
2021-05-18 16:35:23 +02:00
Matthew Olsson
4479c1bff0 LibPDF: Add a bitmap renderer
This commit adds the Renderer class, which is responsible for rendering
a page into a Gfx::Bitmap. There are many improvements to make here,
but this is a great start!
2021-05-18 16:35:23 +02:00
Matthew Olsson
d6a9b41bac LibPDF: Parse page crop box and user units 2021-05-18 16:35:23 +02:00
Matthew Olsson
101639e526 LibPDF: Parse graphics commands 2021-05-18 16:35:23 +02:00
Matthew Olsson
03649f85e2 LibPDF: Don't rely on a stream's /Length key existing
Some PDFs omit this key apparently, but Firefox opens them fine. Let's
emulate that behavior.
2021-05-18 16:35:23 +02:00
Matthew Olsson
2f0a2865f2 LibPDF: Give Parser a reference to the Document
The Parser will need to call resolve_to on certain values.
2021-05-18 16:35:23 +02:00
Matthew Olsson
3aeaceb727 LibPDF: Parse nested Page Tree structures
We now follow nested page tree nodes to find all of the actual
page dicts, whereas previously we just assumed the root level
page tree node contained all of the page children directly.
2021-05-10 10:32:39 +02:00
Matthew Olsson
8c745ad0d9 LibPDF: Parse page structures
This commit introduces the ability to parse the document catalog dict,
as well as the page tree and individual pages. Pages obviously aren't
fully parsed, as we won't care about most of the fields until we
start actually rendering PDFs.

One of the primary benefits of the PDF format is laziness. PDFs are
not meant to be parsed all at once, and the same is true for pages.
When a Document is constructed, it builds a map of page number to
object index, but it does not fetch and parse any of the pages. A page
is only parsed when a caller requests that particular page (and is
cached going forwards).

Additionally, this commit also adds an object_cast function which
logs bad casts if DEBUG_PDF is set. Additionally, utility functions
were added to ArrayObject and DictObject to get all types of objects
from the collections to avoid having to manually cast.
2021-05-10 10:32:39 +02:00
Matthew Olsson
72f693e9ed LibPDF: Add a basic parser and Document structure
This commit adds a parser as well as the Reader class, which serves
as a utility to aid in reading the PDF both forwards and in reverse.
The parser currently is capable of reading xref tables, as well as
all values. We don't really do anything with any of this information,
however.
2021-05-10 10:32:39 +02:00
Matthew Olsson
a8f5b6aaa3 LibPDF: Create basic object structure
This commit is the start of LibPDF, and introduces some basic structure
objects. This emulates LibJS's Value structure, where Value is a simple
class that can contain a pointer to a more complex Object class with
more data. All of the basic PDF objects have a representation.
2021-05-10 10:32:39 +02:00