Commit graph

63 commits

Author SHA1 Message Date
Nico Weber
21917e7b1e LibPDF+PDFViewer+MacPDF: Don't draw hidden text by default
Text can be rendered in various ways in PDFs: Filled, stroked,
both filled and stroked, set as clipping path, hidden, or
some combinations thereof.

We don't implement any of this at the moment except "filled".

Hidden text is used in scanned documents: The image of the scan is
drawn in the background, and then OCRd text is "drawn" as hidden
on top of the scanned bitmap. That way, the (hidden) text can be
selected and copied, and it looks like you're selecting text from
the scanned bitmap. Find-in-page also works similarly. (We currently
have neither text selection nor find-in-page, but one day we will.)

Now that we have pretty good support for CCITT and are growing some
support for JBIG2, we now draw both the scanned background image
as well as the foreground text. They're not always perfectly aligned.

This change makes it so that we don't render text that's marked as
hidden. (We still do most of the coordinate math, which will probably
come in handy at some point when we implement text selection.)

This makes these scanned documents appear as they're supposed to
appear (at least in documents where we manage to decode the background
bitmap).

This also adds a debug option to force rendering of hidden text.
2024-03-16 13:10:48 -04:00
Kyle Lanmon
a099d0e140 PDFViewer: Hide the rendering diagnostics window by default
You can enable it in the Debug menu and we will remember your choice.
2024-03-04 10:43:41 +01:00
Nico Weber
9bff8abcc7 LibPDF: Add support for array image masks
An array image mask contains a min/max range for each channel,
and if each channel of a given pixel is in that channel's range,
that pixel is masked out (i.e. transparent). (It's similar to
having a single color or palette index be transparent, but it
supports a range of transparent colors if desired.)

What makes this a bit awkward is that the range is relative to the
origin bits per pixel and the inputs to the image's color space.

So an indexed (palettized) image with 4bpp has a 2-element mask
array where both entries are between 0 and 15.

We currently apply masks after converting images to a Gfx::Bitmap,
that is after converting to 8bpp sRGB. And we do this by mapping
everything to 8bpp very early on in load_image().

This leaves us with a bunch of options that are all a bit awkward:

1. Make load_image() store the up- (or for 16bpp inputs, down-)
   sampled-to-8bpp pixel data. And also return if we expanded the
   pixel range while resampling (for color values) or not (for
   palettized images). Then, when applying the image filter,
   resample the array bounds in exactly the same way. This requires
   passing around more stuff.

2. Like 1, but pass in the mask array to load_image() and apply
   the mask right there and then. This means we'd apply mask arrays
   at a different time than other masks.

3. Make the function that computes the mask from the mask array
   work from the original, unprocessed image data. This is the most
   local change, but probably also requires the largest amount of
   code (in return, the color mask for 16bpp images is precise, in
   addition that it separates concerns the most nicely).

This goes with 3 for now.
2024-03-03 11:18:37 -05:00
Nico Weber
472bc367d3 LibPDF: Do not have redundant variables for image size
This way, the size of the bitmap cannot become out of sync with these
variables.

No behavior change.
2024-02-27 17:39:13 -05:00
Nico Weber
03fab7089a LibPDF+PDFViewer: Extract Renderer::apply_page_rotation()
No behavior change.
2024-02-27 07:02:02 +01:00
Nico Weber
fa95e5ec0e LibPDF: Fix line drawing when line_width is 0
We used to skip lines with width 0. The correct behavior per spec
is to draw them one pixel wide instead.
2024-02-21 10:30:57 +01:00
Nico Weber
a0462f495c LibPDF+MacPDF: Clip text, and add a debug option for disabling it 2024-01-20 08:56:03 +01:00
Nico Weber
90fdf738a1 LibPDF: Alphabetize clip_ fields in RenderingPreferences
No behavior change.
2024-01-20 08:56:03 +01:00
Nico Weber
66f8259a0b LibPDF: Move ClipRAII to .h file
No behavior change.
2024-01-20 08:56:03 +01:00
Nico Weber
1845a406ea LibPDF: Add debug settings for clipping paths and images 2024-01-17 08:42:56 +00:00
Nico Weber
5615a2691a LibPDF: Extract activate_clip() / deactivate_clip() functions
No behavior change.
2024-01-17 08:42:56 +00:00
MacDue
d55867e563 LibPDF: Fix paths with negatively sized re (rect) commands
Turns out the width/height in a `re` command can be negative. This
results in rectangles with different winding orders. For example, a
negative width results in a reversed winding order.

Previously, this was lost by passing the rect through an
`AffineTransform` before constructing the path. So instead, this
constructs the rect path, and then transforms the resulting path.
2024-01-16 21:31:20 +00:00
Nico Weber
a3507ef65b LibPDF: Move error for /ImageMask out of load_image()
...and tweak load_image() to support loading mask images
(which don't have a color space and are always 1 bit per pixel).
2023-12-23 20:39:11 +01:00
Nico Weber
6032c06f6b Revert "LibPDF: Add basic tiled, coloured pattern rendering"
This reverts commit 8ff87911a3.
2023-12-21 19:24:56 +01:00
Nico Weber
7cb216c95b Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..."
This reverts commit 8c7fc4fe6c.
2023-12-21 19:24:56 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Kyle Pereira
8c7fc4fe6c LibPDF: Offset PaintStyle when painting so pattern overlaps properly 2023-12-10 16:44:24 +01:00
Kyle Pereira
8ff87911a3 LibPDF: Add basic tiled, coloured pattern rendering 2023-12-10 16:44:24 +01:00
Kyle Pereira
8191f2b47a LibPDF: Add parameter for background color of render 2023-12-10 16:44:24 +01:00
Kyle Pereira
082a4197b6 LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces
This is in anticipation of Pattern color space support which does not
yield a simple color.
2023-12-10 16:44:24 +01:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
5513f8bbe3 LibPDF: Move ScopedState from a function on Renderer into Renderer
No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
5eaa403ddf LibPDF: Use font dictionary object as cache key, not resource name
In the main page contents, /T0 might refer to a different font than
it might refer to in an XObject. So don't use the `Tf` argument as
font cache key. Instead, use the address of the font dictionary object.

Fixes false cache sharing, and also allows us to share cache entries
if the same font dict is referred to by two different names.

Fixes a regression from 2340e834cd (but keeps the speed-up intact).
2023-11-17 19:14:39 +01:00
Tim Schumacher
a2f60911fe AK: Rename GenericTraits to DefaultTraits
This feels like a more fitting name for something that provides the
default values for Traits.
2023-11-09 10:05:51 -05:00
Nico Weber
78dea9500f LibPDF: Make operator parsing use ReadonlySpan instead of Vector
No behavior change.
2023-10-20 14:24:31 -04:00
Nico Weber
708d5e2fe6 LibPDF: Implement color_rendering_intent operator
Implements the `ri` operator, and the `RI` key in a graphics state
dictionary.

We don't do anything yet with the color rendering intent except
store it.

No behavior change except removing a few "not yet implemented"
messages.
2023-10-19 16:51:16 -04:00
Nico Weber
c8510b58a3 LibPDF: Cache fonts per page
Previously, every time a page switched fonts, we'd completely
re-parse the font.

Now, we cache fonts in Renderer, effectively caching them per page.

It'd be nice to have an LRU cache across pages too, but that's a
bigger change, and this already helps a lot.

Font size is part of the cache key, which means we re-parse the same
font at different font sizes. That could be better too, but again,
it's a big help as-is already.

Takes rendering the 1310 pages of the PDF 1.7 reference with

    Build/lagom/bin/pdf --debugging-stats \
        ~/Downloads/pdf_reference_1-7.pdf

from 71 s to 11s :^)

Going through pages especially in the index is noticeably snappier.

(On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less
dramatic: From 19s to 16s.)
2023-10-11 07:10:19 +02:00
Nico Weber
c625ba34fe LibPDF: Implement set_flatness_tolerance
We now track it in the graphics state. It isn't used for anything yet.
Fixes the one thing that rendering the first 100 pages of
pdf_reference_1-7.pdf complains about.
2023-07-12 18:22:52 -04:00
Rodrigo Tobar
db9fa7ff07 LibPDF: Allow show_text to return errors
Errors can (and do) occur when trying to render text, and so far we've
silently ignored them, making us think that all is well when it isn't.
Letting show_text return errors will allow us to inform the user about
these errors instead of having to hiding them.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
e87fecf710 LibPDF: Switch to best-effort PDF rendering
The current rendering routine aborts as soon as an error is found during
rendering, which potentially severely limits the contents we show on
screen. Moreover, whenever an error happens the PDFViewer widget shows
an error dialog, and doesn't display the bitmap that has been painted so
far.

This commit improves the situation in both fronts, implementing
rendering now with a best-effort approach. Firstly, execution of
operations isn't halted after an operand results in an error, but
instead execution of all operations is always attempted, and all
collected errors are returned in bulk. Secondly, PDFViewer now always
displays the resulting bitmap, regardless of error being produced or
not. To communicate errors, an on_render_errors callback has been added
so clients can subscribe to these events and handle them as appropriate.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
adc45635e9 LibPDF: Add initial image display support
After adding support for XObject Form rendering, the next was to display
XObject images. This commit adds this initial support,

Images come in many shapes and forms: encodings: color spaces, bits per
component, width, height, etc. This initial support is constrained to
the color spaces we currently support, to images that use 8 bits per
component, to images that do *not* use the JPXDecode filter, and that
are not Masks. There are surely other constraints that aren't considered
in this initial support, so expect breakage here and there.

In addition to supporting images, we also support applying an alpha mask
(SMask) on them. Additionally, a new rendering preference allows to skip
image loading and rendering altogether, instead showing an empty
rectangle as a placeholder (useful for when actual images are not
supported). Since RenderingPreferences is becoming a bit more complex,
we add a hash option that will allow us to keep track of different
preferences (e.g., in a HashMap).
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
ba16310739 LibPDF: Refactor parsing of ColorSpaces
ColorSpaces can be specified in two ways: with a stream as operands of
the color space operations (CS/cs), or as a separate PDF object, which
is then referred to by other means (e.g., from Image XObjects and other
entities). These two modes of addressing a ColorSpace are slightly
different and need to be addressed separately. However, the current
implementation embedded the full logic of the first case in the routine
that created ColorSpace objects.

This commit refactors the creation of ColorSpace to support both cases.
First, a new ColorSpaceFamily class encapsulates the static aspects of a
family, like its name or whether color space construction never requires
parameters. Then we define the supported ColorSpaceFamily objects.

On top of this also sit a breakage on how ColorSpaces are created. Two
methods are now offered: one only providing construction of no-argument
color spaces (and thus taking a simple name), and another taking an
ArrayObject, hence used to create ColorSpaces requiring arguments.

Finally, on top of *that* two ways to get a color space in the Renderer
are made available: the first creates a ColorSpace with a name and a
Resources dictionary, and another takes an Object. These model the two
addressing modes described above.
2022-12-10 10:49:03 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Rodrigo Tobar
b3007c17bd LibPDF: Allow operators to receive optional resources
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.

This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
fe5c823989 LibPDF: Communicate resources to ColorSpace, not Page
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e92ec26771 LibPDF: Introduce rendering preferences and show clipping paths
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.
2022-11-25 23:03:24 +01:00
Rodrigo Tobar
a1e36e8f78 LibPDF: Improve path clipping support
The existing path clipping support was broken, as it performed the
clipping operation as soon as the path clipping commands (W/W*) were
received. The correct behavior is to keep a clipping path in the
graphic state, *intersect* that with the current path upon receiving
W/W*, and apply the clipping when performing painting operations. On top
of that, the intersection happening at W/W* time does not affect the
painting operation happening on the current on-build path, but takes
effect only after the current path is cleared; therefore a current and a
next clipping path need to be kept track of.

Path clipping is not yet supported on the Painter class, nor is path
intersection. We thus continue using the same simplified bounding box
approach to calculate clipping paths.

Since now we are dealing with more rectangles-as-path code, I've made
helper functions to build a rectangle path and reuse it as needed.
2022-11-25 23:03:24 +01:00
Julian Offenhäuser
9cb3b23377 LibPDF: Move all font handling to Type1Font and TrueTypeFont classes
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.

Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.

This means that we can now render embedded TrueType fonts as well :^)

It also makes the renderer's job much more simple and leads to a much
cleaner API design.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
97ed4106e5 LibPDF: Fix text positioning with operator TJ
As per spec, the positioning (or kerning) parameter of this operator
should translate the text matrix before the next showing of text.
Previously, this calculation was slightly wrong and also only applied
after the text was already shown.
2022-09-17 10:07:14 +01:00
sin-ack
c8585b77d2 Everywhere: Replace single-char StringView op. arguments with chars
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
7456904a39 Meta+Userland: Simplify some formatters
These are mostly minor mistakes I've encountered while working on the
removal of StringView(char const*). The usage of builder.put_string over
Format<FormatString>::format is preferrable as it will avoid the
indirection altogether when there's no formatting to be done. Similarly,
there is no need to do format(builder, "{}", number) when
builder.put_u64(number) works equally well.

Additionally a few Strings where only constant strings were used are
replaced with StringViews.
2022-07-12 23:11:35 +02:00
Simon Wanner
206d6ece55 LibGfx: Move other font-related files to LibGfx/Font/ 2022-04-09 23:48:18 +02:00
Matthew Olsson
468ceb1b48 LibPDF: Rename Command to Operator
This is the correct name, according to the spec
2022-03-31 18:10:45 +02:00
Matthew Olsson
5f9d35909d LibPDF: Move font files into their own directory 2022-03-31 18:10:45 +02:00
Matthew Olsson
130846f337 LibPDF: Use AntiAliasingPainter in Renderer when possible 2022-03-31 18:10:45 +02:00
Matthew Olsson
9a4a3318a9 LibPDF: Store a PDFFont in the Renderer's text state 2022-03-29 02:52:57 +02:00
Matthew Olsson
b240d23a87 LibPDF: Propagate errors in Renderer/PDFViewer 2022-03-07 10:53:57 +01:00