Commit graph

20698 commits

Author SHA1 Message Date
Tim Ledbetter
7ee09ca49d LibGfx/WOFF: Avoid overflow in table directory search range
This commit limits `WOFF::Header::num_tables` to 4096. This limitation
is not explicitly mentioned in the specification, but allowing numbers
larger than this results in an overflow when calculating
`search_range` and `range_shift`.
2023-10-24 07:29:09 +02:00
Timothy Flynn
6af279a22d LibWebView: Add a helper to get selected text with collapsed whitespace 2023-10-24 07:28:30 +02:00
Timothy Flynn
e221b3afeb LibWebView: Add an API to format a search query for UI display
This will create a string of the form:

    Search DuckDuckGo for "Ladybird is awesome!"

If the provided query URL is unknown, the engine name is excluded (e.g.
for custom search URLs).
2023-10-24 07:28:30 +02:00
Timothy Flynn
c8c3d00615 LibWebView: Rename find_search_engine to find_search_engine_by_name
We will also need to search by URL in the Serenity chrome.
2023-10-24 07:28:30 +02:00
Bastiaan van der Plaat
d5ca8209bf LibWeb: Add input element valueAsNumber 2023-10-24 07:27:14 +02:00
Bastiaan van der Plaat
d290569535 LibWeb: Don't create shadow root for input hidden 2023-10-24 07:27:14 +02:00
Bastiaan van der Plaat
3e778fe29c LibWeb: Reorder and add missing form elements IDL items 2023-10-24 07:27:14 +02:00
Aliaksandr Kalenik
4dab17427f LibWeb: Use max content contribution in flex_fraction in GFC
As spec comment in the code says we should use item’s max-content
contribution to calculate flex fraction.

Likely, it was calculate_max_content_size() because we didn't have
calculate_max_content_contribution() when this function was implemented
initially.
2023-10-24 07:26:25 +02:00
Aliaksandr Kalenik
802b58d7e1 LibWeb: Resolve grid item's min-width and max-width in GFC
Now min-width and max-width properties affect resulting width of grid
item instead of being ignored.
2023-10-24 07:25:20 +02:00
Adrian Mück
7e10f76021 LibGUI: Don't update ComboBox text when model index is invalid
Without ENABLE_TIME_ZONE_DATA the user is able to update the time zone
selection when using "on_mousewheel" or "on_{up,down}_pressed" even
though UTC is supposed to be the only option. Due to an invalid model
index, the selection is set to "[null]".

Prior to this patch, the ComboBox text in "selection_updated" is set
at each call.
2023-10-23 21:18:36 -04:00
Nico Weber
3fe9f8e48d LibPDF: Don't accidentally form new tokens on pages with contents arrays
A page's /Contents can be an array of streams, and the page's contents
are then as if those streams are concatenated.

Most of the time, a stream ends with whitespace. But in some cases
(e.g. 0000642.pdf from 0000.zip from the pdfa dataset), the first
stream ends with an operator (`Q`) and the next stream starts with
one (`q`), and the concatenation would form a new, unkonwn operator
(`Qq`). Separate the streams' contents with a space to prevent that.

Reduces numbers of PDF files we fail to open in the -n 500 case
from 11 to 10 (in either case, we then crash on 18 of the PDFs
that we do manage to open).
2023-10-23 13:23:54 -04:00
Timothy Flynn
b770ed03ac LibWebView: Define the list of built-in search engines in LibWebView
These engines and their query URLs are duplicated in several places.
Before implementing search support in the AppKit chrome, let's move
these engines to LibWebView.
2023-10-23 12:12:36 -04:00
Nico Weber
11bee7a075 LibPDF: Don't crash on fixed-width type 1 fonts that use /MissingWidth
Type 1 fonts usually have a m_font_program and no m_font -- they only
have m_font if we're using a replacement font for the fonts that
were built-in to PDFs before Acrobat 4.0 (and must still work to
show existing files).

However, SimpleFont::get_glyph_width() used to always return a
float, which in Type1Font was only implemented if m_font was set.

Per spec, we're supposed to just use /MissingWidth for fonts that
are missing an entry in the descriptor's /Width array. However, for
built-in fonts, no explicit /Width array is needed (PDF 1.7 spec,
Appendix H.3, 5.5.1). So if we just always use /MissingWidth,
then PDFs that use a built-in font draw all their text on top
of each other (e.g. 000333.pdf from stillhq.com-pdfdb).

So change get_glyph_width() to return Optional<float>, return
it only in Type1Font if m_font is set, and use MissingWidth
if it isn't set.

That way, replacement fonts still return a width, and real
fonts that are supposed to have /Width and use /MissingWidth
for missing entries do what they're supposed to too, instead
of crashing.

From 20 (6%) to 16 (5%) crashes on the 300 first PDFs, and from
39 (7.8%) to 31 (6.2%) on the 500-random PDFs test.
2023-10-23 09:33:03 -04:00
Nico Weber
52afa936c4 LibPDF: Don't over-read in charset formats 1 and 2
`left` might be a number bigger than there are actually glyphs in the
CFF.

The spec says "The number of ranges is not explicitly specified in the
font. Instead, software utilizing this data simply processes ranges
until all glyphs in the font are covered." Apparently we have to check
for this within each range as well.

Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the
pdfa dataset.

Together with the previous commit:

From 21 (7%) to 20 (6%) crashes on the 300 first PDFs, and from
41 (8.2%) to 39 (7.8%) on the 500-random PDFs test.
2023-10-23 09:31:11 -04:00
Nico Weber
58ff7b5336 LibPDF: Support offset size 3 in CFF index reading
...and replace template instantiations with a loop, to make this
easily possible.

Vaguely nice for code size as well.

Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the
pdfa dataset.
2023-10-23 09:31:11 -04:00
Nico Weber
3197f0cab6 LibPDF: Handle CFF fonts with charset format 0 and > 255 glyphs better
We used to use an u8 as loop counter, which would overflow
if there were more than 255 glyphs, producing hundreds of megabytes
of

    Couldn't find string for SID x, going with space

output in the process, while all data until the end of the CFF
section got interpreted as SIDs, until a try_read() would finally
fail.

We now no longer fail miserably trying to render page 2 of
0000352.pdf of 0000.zip from the pdfa dataset.

Fixes just one crash of the larger 500-document test set, but
when I tweak test_pdf.py to print all stacks instead of just the
top 5, it no longer produces 260 MB of output.
2023-10-23 09:31:11 -04:00
Nico Weber
0869ca5615 LibPDF: Add more CFF_DEBUG output 2023-10-23 09:31:11 -04:00
Nico Weber
cf705eb235 LibPDF: Use TRY() to get decompression result
Makes us die with a better error message for some PDFs.
2023-10-23 09:30:41 -04:00
Nico Weber
6153dd7b84 LibPDF: Tolerate comments after dict values
Makes 0000607.pdf from 0000.zip from the pdfa dataset load.
2023-10-23 09:28:00 -04:00
Jesús (gsus) Lapastora
2086b8df9c LibJS/Date: Ensure YearFromTime(t) holds invariant after approximation
As of https://tc39.es/ecma262/#sec-yearfromtime, YearFromTime(t) should
return `y` such that `TimeFromYear(YearFromTime(t)) <= t`. This wasn't
held, since the approximation contained decimal digits that would nudge
the final value in the wrong direction.

Adapted from Kiesel:
6548a85743

Co-authored-by: Linus Groh <mail@linusgroh.de>
2023-10-23 09:26:55 -04:00
Bastiaan van der Plaat
5870a1a9a1 AK: Remove rarely used ExtraMathConstants.h 2023-10-23 12:04:51 +01:00
Nico Weber
a1f17bd643 LibPDF: Skip inline image data in operator stream
Inline images can contain arbitrary binary data in the operator stream,
greatly confusing the operator parser.

Just skip them for now. They'll produce a
`Rendering of feature not supported: draw operation: inline_image_begin`
diag as usual, so we won't forget about it.

After #21536, reduces number of crashes on 300 random PDFs from the web
(the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 23 (7%) to 22 (7%).

On a larger sample (`Meta/test_pdf.py -n 500 ~/Downloads/0000`),
reduces number of crashes from 53 (10.6%) with 36 distinct crash
stacks to 46 (9.2%) with 33 distinct stacks.
2023-10-23 07:51:08 +02:00
Sam Atkins
e108f394bf LibGfx: Replace manual offsets when producing WOFF2 loca table 2023-10-22 19:42:22 +02:00
Sam Atkins
885665b3a6 LibGfx: Simplify writing to WOFF2 reconstructed glyf table 2023-10-22 19:42:22 +02:00
Sam Atkins
ad717af63d LibGfx: Read WOFF2 transformed GLYF table buffers in-place
This saves us from having to allocate a buffer and copying the data,
when it's already available to us. Also, less code. :^)
2023-10-22 19:42:22 +02:00
Sam Atkins
9642a0f43a LibGfx: Use a struct for reading WOFF2 transformed GLYF table 2023-10-22 19:42:22 +02:00
Sam Atkins
8e96902c75 LibGfx: Use OpenType offset table structs when reading WOFF2 font data 2023-10-22 19:42:22 +02:00
Sam Atkins
b73b434f80 LibGfx: Use a Header struct when reading WOFF2 font data 2023-10-22 19:42:22 +02:00
Sam Atkins
9f93ae4bfc LibGfx: Use offset table structs when reading WOFF font data 2023-10-22 19:42:22 +02:00
Sam Atkins
d80c528eb4 LibGfx: Add structs for OpenType offset table 2023-10-22 19:42:22 +02:00
Sam Atkins
e7fe377501 LibGfx: Use a Stream to read WOFF font data
This lets us read structs directly from the data, instead of having to
construct them from manual offsets.
2023-10-22 19:42:22 +02:00
Aliaksandr Kalenik
122d847720 LibWeb: Fix building of areas spanning multiple rows in GFC
Rewrites the grid area building to accurately identify areas that span
multiple rows. Also now we can recognize invalid areas but do not
handle them yet.
2023-10-22 19:38:18 +02:00
MacDue
49366951ee LibWeb: Fix outer box-shadows after 063e66c
The shrink should only be applied for inner box-shadows.
2023-10-22 18:38:22 +02:00
MacDue
1c012f0a4a LibWeb: Remove "cached corner bitmap" and its use in the corner clipper
With the recording painter the actual painting operations are delayed,
so now if multiple corner clippers are constructed, and they use a
shared bitmap they can interfere with each other. The use of this shared
bitmap was somewhat questionable anyway, so this is not much of a loss.

This fixes the border-radius.html test page.
2023-10-21 23:16:17 +02:00
Nico Weber
1a58fee0fd LibPDF: Don't assert on named simple color space
If a PDF uses `/CustomName cs` and `/CustomName` then points at just a
name like `/DeviceGray` instead of an array, that's ok. Just using
`/DeviceGray cs` is simpler, so this extra level of indirection is
somewhat rare in practice, but it's valid and it does happen. So support
it.

We already have a helper that does the right thing that we just need to
call.

Together with #21524 and #21525, reduces number of crashes on 300 random
PDFs from the web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 29 (9%) to 25 (8%).
2023-10-21 21:04:26 +02:00
Nico Weber
04aec4a032 LibPDF: Don't log CFF Copyright tag as unknown 2023-10-21 21:04:02 +02:00
Aliaksandr Kalenik
f32764975a LibWeb: Remove ClearRect command in RecordingPainter
There is only one usage of ClearRect command and it could be replaced
with FillRect to make set of commands in RecordingPainter smaller.
2023-10-21 18:50:28 +02:00
Nico Weber
8922574133 LibPDF: Fix assertion when destination page is an index
This isn't correct per spec, but it happens in practice, e.g.
0000847.pdf, 0000327.pdf, 0000124.pdf from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/
2023-10-21 09:10:30 +02:00
Nico Weber
fbd00d9c8e LibPDF: Use resolve_to on /Dests entry
Fixes an assertion if /Dests is an indirect object (`24 0 R`)
instead of an inline dictionary.
2023-10-21 09:10:30 +02:00
Nico Weber
8c3478a921 LibPDF: Use resolve_to() helper
No behavior change.
2023-10-21 09:10:30 +02:00
Nico Weber
801cfd5ae3 LibPDF: Let parser process filters by default
This fixes a small bug from 39b2eed3f6: That commit tried to disable
filters for the very first object read, for the case covered in
Tests/LibPDF/password-is-sup.pdf.

However, it accidentally also disabled filters by default.

Most of the time, this isn't really a difference: We call
`set_filters_enabled(true);` very early in
`DocumentParser::initialize_linearization_dict()`, which explicitly
enables filters, and `initialize_linearization_dict()` is the very
first thing called in `DocumentParser::initialize()`.

But there's an early exit in `initialize_linearization_dict()`
for if there's nothing looking like an indirect object right
after the header, and in this case we used to not enable
filtering, and would hand compressed streams to the operand parser.

(And due to a 2nd bug, we'd even do this if the header line was
followed by an empty line.)
2023-10-21 09:09:53 +02:00
Nico Weber
cf26fc2393 LibPDF: Make parser skip whitespace after header
0000990.pdf from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/
starts like so:

```
%PDF-1.7

4 0 obj
```

parse_heaader() used to put the cursor at the start of the 2nd,
empty, line. initialize_linearization_dict() would then check
if `m_reader.matches_number()` to see if there could possibly
be a linearization dict.

In this case, there isn't one, but we should detect linearization
dicts even if they're separated by whitespace from the first line.
2023-10-21 09:09:53 +02:00
Nico Weber
34cb506bad LibPDF: Replace another TODO with a message
Like ca1a98ba9f, but for stroke color.
2023-10-21 09:09:06 +02:00
Aliaksandr Kalenik
719b12b19d LibWeb: Support alignment of abspos grid items
Grid items should respect alignment properties if top/right/bottom/left
are not specified.

This change adds a separate implementation of
layout_absolutely_positioned_element that is extended with support for
alignment.
2023-10-21 09:08:51 +02:00
Aliaksandr Kalenik
2def1de4be LibWeb: Rerun rows sizings if grid auto height is less than min-height
If the first pass of rows sizing results in the container's automatic
height being less than the specified min-height, we need to run a
second pass using the updated available space.
2023-10-21 09:08:11 +02:00
Nico Weber
9442782881 LibPDF: Implement text_next_line_show_string_set_spacing
Not used terribly often, but e.g. used in 000333.pdf page 17 in
stillhq.com-pdfdb.
2023-10-20 14:24:31 -04:00
Nico Weber
78dea9500f LibPDF: Make operator parsing use ReadonlySpan instead of Vector
No behavior change.
2023-10-20 14:24:31 -04:00
Nico Weber
e0268dcc87 LibPDF: Allow /Pattern to be used directly as a color space name
Per spec:

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

We still don't implement /Pattern color spaces, but now we no longer
crash trying to look up the potentially-nonexistent /ColorSpace
dictionary on the page object when /Pattern is used directly as color
space name.

On top of #21514, reduces number of crashes on 300 random PDFs from the
web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 42 (14%) to 34 (11%).
2023-10-20 10:35:54 -06:00
Nico Weber
aea0e2f313 LibPDF: Rename ColorSpaceFamily function to may_be_specified_directly()
It used to be called ColorSpaceFamily::never_needs_parameters().

But in the cpp file, the macro arg was called ever_needs_parameters,
and the spec says

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

so let's use that language here.

No behavior change.
2023-10-20 10:35:54 -06:00
Nico Weber
095a2a17ed LibPDF: Replace TODO()s in Type0Font code with Errors
...which causes us to not render these fonts instead of crashing.

Reduces number of crashes on 300 random PDFs from the web (the first 300
from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 64 (21%) to 42 (14%).
2023-10-20 10:33:59 -06:00