Commit graph

55654 commits

Author SHA1 Message Date
Nico Weber
cf705eb235 LibPDF: Use TRY() to get decompression result
Makes us die with a better error message for some PDFs.
2023-10-23 09:30:41 -04:00
Nico Weber
6153dd7b84 LibPDF: Tolerate comments after dict values
Makes 0000607.pdf from 0000.zip from the pdfa dataset load.
2023-10-23 09:28:00 -04:00
Jesús (gsus) Lapastora
2086b8df9c LibJS/Date: Ensure YearFromTime(t) holds invariant after approximation
As of https://tc39.es/ecma262/#sec-yearfromtime, YearFromTime(t) should
return `y` such that `TimeFromYear(YearFromTime(t)) <= t`. This wasn't
held, since the approximation contained decimal digits that would nudge
the final value in the wrong direction.

Adapted from Kiesel:
6548a85743

Co-authored-by: Linus Groh <mail@linusgroh.de>
2023-10-23 09:26:55 -04:00
Bastiaan van der Plaat
5870a1a9a1 AK: Remove rarely used ExtraMathConstants.h 2023-10-23 12:04:51 +01:00
Nico Weber
a1f17bd643 LibPDF: Skip inline image data in operator stream
Inline images can contain arbitrary binary data in the operator stream,
greatly confusing the operator parser.

Just skip them for now. They'll produce a
`Rendering of feature not supported: draw operation: inline_image_begin`
diag as usual, so we won't forget about it.

After #21536, reduces number of crashes on 300 random PDFs from the web
(the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 23 (7%) to 22 (7%).

On a larger sample (`Meta/test_pdf.py -n 500 ~/Downloads/0000`),
reduces number of crashes from 53 (10.6%) with 36 distinct crash
stacks to 46 (9.2%) with 33 distinct stacks.
2023-10-23 07:51:08 +02:00
Sam Atkins
e108f394bf LibGfx: Replace manual offsets when producing WOFF2 loca table 2023-10-22 19:42:22 +02:00
Sam Atkins
885665b3a6 LibGfx: Simplify writing to WOFF2 reconstructed glyf table 2023-10-22 19:42:22 +02:00
Sam Atkins
ad717af63d LibGfx: Read WOFF2 transformed GLYF table buffers in-place
This saves us from having to allocate a buffer and copying the data,
when it's already available to us. Also, less code. :^)
2023-10-22 19:42:22 +02:00
Sam Atkins
9642a0f43a LibGfx: Use a struct for reading WOFF2 transformed GLYF table 2023-10-22 19:42:22 +02:00
Sam Atkins
8e96902c75 LibGfx: Use OpenType offset table structs when reading WOFF2 font data 2023-10-22 19:42:22 +02:00
Sam Atkins
b73b434f80 LibGfx: Use a Header struct when reading WOFF2 font data 2023-10-22 19:42:22 +02:00
Sam Atkins
9f93ae4bfc LibGfx: Use offset table structs when reading WOFF font data 2023-10-22 19:42:22 +02:00
Sam Atkins
d80c528eb4 LibGfx: Add structs for OpenType offset table 2023-10-22 19:42:22 +02:00
Sam Atkins
e7fe377501 LibGfx: Use a Stream to read WOFF font data
This lets us read structs directly from the data, instead of having to
construct them from manual offsets.
2023-10-22 19:42:22 +02:00
Aliaksandr Kalenik
122d847720 LibWeb: Fix building of areas spanning multiple rows in GFC
Rewrites the grid area building to accurately identify areas that span
multiple rows. Also now we can recognize invalid areas but do not
handle them yet.
2023-10-22 19:38:18 +02:00
MacDue
8eab44896a Tests/LibWeb: Add outer box-shadow ref test
This is a screenshot test based on a cut-down version of the box-shadow
demo page.
2023-10-22 18:38:22 +02:00
MacDue
49366951ee LibWeb: Fix outer box-shadows after 063e66c
The shrink should only be applied for inner box-shadows.
2023-10-22 18:38:22 +02:00
segfaultdev
c93df9ead9 Base: Add & rename emoji
💁 - U+1F481 PERSON TIPPING HAND
💁‍♀️ - U+1F481 U+200D U+2640 WOMAN TIPPING HAND
💁‍♂️ - U+1F481 U+200D U+2642 MAN TIPPING HAND
💆 - U+1F486 PERSON GETTING MASSAGE
💆‍♀️ - U+1F486 U+200D U+2640 WOMAN GETTING MASSAGE
💆‍♂️ - U+1F486 U+200D U+2642 MAN GETTING MASSAGE
💇 - U+1F487 PERSON GETTING HAIRCUT
💇‍♀️ - U+1F487 U+200D U+2640 WOMAN GETTING HAIRCUT
💇‍♂️ - U+1F487 U+200D U+2642 MAN GETTING HAIRCUT
🙅 - U+1F645 PERSON GESTURING NO
🙅‍♀️ - U+1F645 U+200D U+2640 WOMAN GESTURING NO
🙅‍♂️ - U+1F645 U+200D U+2642 MAN GESTURING NO
🙆 - U+1F646 PERSON GESTURING OK
🙆‍♀️ - U+1F646 U+200D U+2640 WOMAN GESTURING OK
🙆‍♂️ - U+1F646 U+200D U+2642 MAN GESTURING OK
🙇 - U+1F647 PERSON BOWING
🙇‍♀️ - U+1F647 U+200D U+2640 WOMAN BOWING
🙇‍♂️ - U+1F647 U+200D U+2642 MAN BOWING
🙋 - U+1F64B PERSON RAISING HAND
🙋‍♀️ - U+1F64B U+200D U+2640 WOMAN RAISING HAND
🙋‍♂️ - U+1F64B U+200D U+2642 MAN RAISING HAN
2023-10-22 14:08:03 +01:00
segfaultdev
c7baea5d29 Base: Add more emoji
🧑‍💻 - U+1F9D1 U+200D U+1F4BB TECHNOLOGIST
🧑‍💼 - U+1F9D1 U+200D U+1F4BC OFFICE WORKER
🧑‍🔧 - U+1F9D1 U+200D U+1F527 MECHANIC
🧑‍🔬 - U+1F9D1 U+200D U+1F52C SCIENTIST
🧑‍🚀 - U+1F9D1 U+200D U+1F680 ASTRONAUT
🧑‍🚒 - U+1F9D1 U+200D U+1F692 FIREFIGHTER
2023-10-22 14:08:03 +01:00
segfaultdev
ea25865d89 Base: Add more emoji
👨‍⚕️ - U+1F468 U+200D U+2695 MAN HEALTH WORKER
👨‍⚖️ - U+1F468 U+200D U+2696 MAN JUDGE
👩‍⚕️ - U+1F469 U+200D U+2695 WOMAN HEALTH WORKER
👩‍⚖️ - U+1F469 U+200D U+2696 WOMAN JUDGE
👳 - U+1F473 PERSON WEARING TURBAN
👳‍♀️ - U+1F473 U+200D U+2640 WOMAN WEARING TURBAN
👳‍♂️ - U+1F473 U+200D U+2642 MAN WEARING TURBAN
🧑‍⚕️ - U+1F9D1 U+200D U+2695 HEALTH WORKER
🧑‍⚖️ - U+1F9D1 U+200D U+2696 JUDGE
2023-10-22 14:08:03 +01:00
segfaultdev
c65f19186d Base: Add more emoji
👨 - U+1F468 MAN
👨‍🦰 - U+1F468 U+200D U+1F9B0 MAN RED HAIR
👨‍🦱 - U+1F468 U+200D U+1F9B1 MAN CURLY HAIR
👨‍🦲 - U+1F468 U+200D U+1F9B2 MAN BALD
👨‍🦳 - U+1F468 U+200D U+1F9B3 MAN WHITE HAIR
👩 - U+1F469 WOMAN
👩‍🦰 - U+1F469 U+200D U+1F9B0 WOMAN RED HAIR
👩‍🦱 - U+1F469 U+200D U+1F9B1 WOMAN CURLY HAIR
👩‍🦲 - U+1F469 U+200D U+1F9B2 WOMAN BALD
👩‍🦳 - U+1F469 U+200D U+1F9B3 WOMAN WHITE HAIR
👱 - U+1F471 PERSON BLOND HAIR
👱‍♀️ - U+1F471 U+200D U+2640 WOMAN BLOND HAIR
👱‍♂️ - U+1F471 U+200D U+2642 MAN BLOND HAIR
🧑 - U+1F9D1 PERSON
🧑‍🦰 - U+1F9D1 U+200D U+1F9B0 PERSON RED HAIR
🧑‍🦱 - U+1F9D1 U+200D U+1F9B1 PERSON CURLY HAIR
🧑‍🦲 - U+1F9D1 U+200D U+1F9B2 PERSON BALD
🧑‍🦳 - U+1F9D1 U+200D U+1F9B3 PERSON WHITE HAIR
2023-10-22 14:08:03 +01:00
segfaultdev
eb07a08178 Base: Add more emoji
👨‍💻 - U+1F468 U+200D U+1F4BB MAN TECHNOLOGIST
👨‍💼 - U+1F468 U+200D U+1F4BC MAN OFFICE WORKER
👨‍🔧 - U+1F468 U+200D U+1F527 MAN MECHANIC
👨‍🔬 - U+1F468 U+200D U+1F52C MAN SCIENTIST
👨‍🚀 - U+1F468 U+200D U+1F680 MAN ASTRONAUT
👨‍🚒 - U+1F468 U+200D U+1F692 MAN FIREFIGHTER
👩‍💻 - U+1F469 U+200D U+1F4BB WOMAN TECHNOLOGIST
👩‍💼 - U+1F469 U+200D U+1F4BC WOMAN OFFICE WORKER
👩‍🔧 - U+1F469 U+200D U+1F527 WOMAN MECHANIC
👩‍🔬 - U+1F469 U+200D U+1F52C WOMAN SCIENTIST
👩‍🚀 - U+1F469 U+200D U+1F680 WOMAN ASTRONAUT
👩‍🚒 - U+1F469 U+200D U+1F692 WOMAN FIREFIGHTER
2023-10-22 14:08:03 +01:00
implicitfield
2745b48e16 Shell: Don't try to cast NonnullRefPtrs when priting debug output
Fixes a regression from 8a48246e.
2023-10-22 02:02:35 +03:30
hanaa12G
54e1470467 AK: Pass correct length to StringUtils::convert_to_floating_point()
Fixed the issue in StringUtils::convert_to_floating_point() where the
end pointer of the trimmed string was not being passed, causing the
function to consistently return 'None' when given strings with trailing
whitespaces.
2023-10-22 00:22:29 +02:00
MacDue
f0be812fc2 Tests/LibWeb: Add border-radius ref test
This is based on the border-radius.html demo page with text and most
assets removed. This has to be a screenshot based test as there's not
really something else that can be used for comparison.

This also makes the test a little incomplete as things like text
overflow clipping are not tested, but I'd like to avoid this test being
too brittle.
2023-10-21 23:16:17 +02:00
MacDue
1c012f0a4a LibWeb: Remove "cached corner bitmap" and its use in the corner clipper
With the recording painter the actual painting operations are delayed,
so now if multiple corner clippers are constructed, and they use a
shared bitmap they can interfere with each other. The use of this shared
bitmap was somewhat questionable anyway, so this is not much of a loss.

This fixes the border-radius.html test page.
2023-10-21 23:16:17 +02:00
Nico Weber
1a58fee0fd LibPDF: Don't assert on named simple color space
If a PDF uses `/CustomName cs` and `/CustomName` then points at just a
name like `/DeviceGray` instead of an array, that's ok. Just using
`/DeviceGray cs` is simpler, so this extra level of indirection is
somewhat rare in practice, but it's valid and it does happen. So support
it.

We already have a helper that does the right thing that we just need to
call.

Together with #21524 and #21525, reduces number of crashes on 300 random
PDFs from the web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 29 (9%) to 25 (8%).
2023-10-21 21:04:26 +02:00
Nico Weber
3dd68f6026 Meta: Tweak test_pdf.py script
* Elide parser offsets to better group parser errors
* Use `backslashreplace` for decoding crash stacks so that we don't
  crash when printing crash stacks if the error output isn't valid
  utf-8
* Swap last two lines of output, reads a bit better
2023-10-21 21:04:02 +02:00
Nico Weber
04aec4a032 LibPDF: Don't log CFF Copyright tag as unknown 2023-10-21 21:04:02 +02:00
Aliaksandr Kalenik
f32764975a LibWeb: Remove ClearRect command in RecordingPainter
There is only one usage of ClearRect command and it could be replaced
with FillRect to make set of commands in RecordingPainter smaller.
2023-10-21 18:50:28 +02:00
Nico Weber
8922574133 LibPDF: Fix assertion when destination page is an index
This isn't correct per spec, but it happens in practice, e.g.
0000847.pdf, 0000327.pdf, 0000124.pdf from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/
2023-10-21 09:10:30 +02:00
Nico Weber
fbd00d9c8e LibPDF: Use resolve_to on /Dests entry
Fixes an assertion if /Dests is an indirect object (`24 0 R`)
instead of an inline dictionary.
2023-10-21 09:10:30 +02:00
Nico Weber
8c3478a921 LibPDF: Use resolve_to() helper
No behavior change.
2023-10-21 09:10:30 +02:00
Nico Weber
801cfd5ae3 LibPDF: Let parser process filters by default
This fixes a small bug from 39b2eed3f6: That commit tried to disable
filters for the very first object read, for the case covered in
Tests/LibPDF/password-is-sup.pdf.

However, it accidentally also disabled filters by default.

Most of the time, this isn't really a difference: We call
`set_filters_enabled(true);` very early in
`DocumentParser::initialize_linearization_dict()`, which explicitly
enables filters, and `initialize_linearization_dict()` is the very
first thing called in `DocumentParser::initialize()`.

But there's an early exit in `initialize_linearization_dict()`
for if there's nothing looking like an indirect object right
after the header, and in this case we used to not enable
filtering, and would hand compressed streams to the operand parser.

(And due to a 2nd bug, we'd even do this if the header line was
followed by an empty line.)
2023-10-21 09:09:53 +02:00
Nico Weber
cf26fc2393 LibPDF: Make parser skip whitespace after header
0000990.pdf from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/
starts like so:

```
%PDF-1.7

4 0 obj
```

parse_heaader() used to put the cursor at the start of the 2nd,
empty, line. initialize_linearization_dict() would then check
if `m_reader.matches_number()` to see if there could possibly
be a linearization dict.

In this case, there isn't one, but we should detect linearization
dicts even if they're separated by whitespace from the first line.
2023-10-21 09:09:53 +02:00
Nico Weber
5b36355be8 Meta: Add a script for rendering many PDFs in parallel
The rendering happens only in-memory, so this is only useful for
looking at the crash rate and the reports of missing features.

To actually see the output of a file, use

    pdf --render out.png --page N path/to/input.pdf

instead.
2023-10-21 09:09:21 +02:00
Nico Weber
34cb506bad LibPDF: Replace another TODO with a message
Like ca1a98ba9f, but for stroke color.
2023-10-21 09:09:06 +02:00
Aliaksandr Kalenik
719b12b19d LibWeb: Support alignment of abspos grid items
Grid items should respect alignment properties if top/right/bottom/left
are not specified.

This change adds a separate implementation of
layout_absolutely_positioned_element that is extended with support for
alignment.
2023-10-21 09:08:51 +02:00
Aliaksandr Kalenik
2def1de4be LibWeb: Rerun rows sizings if grid auto height is less than min-height
If the first pass of rows sizing results in the container's automatic
height being less than the specified min-height, we need to run a
second pass using the updated available space.
2023-10-21 09:08:11 +02:00
cflip
b7b57523cc Ports/ClassiCube: Update ClassiCube to version 1.3.6 2023-10-20 23:24:03 +02:00
Nico Weber
9442782881 LibPDF: Implement text_next_line_show_string_set_spacing
Not used terribly often, but e.g. used in 000333.pdf page 17 in
stillhq.com-pdfdb.
2023-10-20 14:24:31 -04:00
Nico Weber
78dea9500f LibPDF: Make operator parsing use ReadonlySpan instead of Vector
No behavior change.
2023-10-20 14:24:31 -04:00
Kemal Zebari
369f1d72ba AK/URLParser: Add spec comments to parse_opaque_host() 2023-10-20 12:20:55 -06:00
Kemal Zebari
2d27998f28 AK/URLParser: Complete is_url_code_point() implementation 2023-10-20 12:20:55 -06:00
Nico Weber
e0268dcc87 LibPDF: Allow /Pattern to be used directly as a color space name
Per spec:

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

We still don't implement /Pattern color spaces, but now we no longer
crash trying to look up the potentially-nonexistent /ColorSpace
dictionary on the page object when /Pattern is used directly as color
space name.

On top of #21514, reduces number of crashes on 300 random PDFs from the
web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 42 (14%) to 34 (11%).
2023-10-20 10:35:54 -06:00
Nico Weber
aea0e2f313 LibPDF: Rename ColorSpaceFamily function to may_be_specified_directly()
It used to be called ColorSpaceFamily::never_needs_parameters().

But in the cpp file, the macro arg was called ever_needs_parameters,
and the spec says

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

so let's use that language here.

No behavior change.
2023-10-20 10:35:54 -06:00
Nico Weber
095a2a17ed LibPDF: Replace TODO()s in Type0Font code with Errors
...which causes us to not render these fonts instead of crashing.

Reduces number of crashes on 300 random PDFs from the web (the first 300
from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 64 (21%) to 42 (14%).
2023-10-20 10:33:59 -06:00
Nico Weber
33443f7991 LibPDF: Implement ICCBasedColorSpace::number_of_components()
We now no longer crash on images that use an ICC-based color space.
Reduces number of crashes on 300 random PDFs from the web (the first 300
from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 81 (27%) to 64 (21%).

Also fixes all remaining crashes in
411_getting_started_with_instruments.pdf and
513_high_efficiency_image_file_format.pdf.
2023-10-20 08:58:52 +02:00
Nico Weber
f5d3f47af3 LibPDF: Add spec comment about color spaces on images 2023-10-20 08:58:52 +02:00
Nico Weber
7c24a89acf LibPDF: Add spec comment about valid bits_per_component values 2023-10-20 08:58:52 +02:00