Commit graph

12 commits

Author SHA1 Message Date
Rodrigo Tobar
64bbe431b5 LibPDF: Add char_code -> name mapping function
We already keep both mappings internally, now it's time to actually use
it.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
286e3e6872 LibPDF: Simplify Encoding to align with simple font requirements
All "Simple Fonts" in PDF (all but Type0 fonts) have the property that
glyphs are selected with single byte character codes. This means that
the Encoding objects should use u8 for representing these character
codes. Moreover, and as mentioned in a previous commit, there is no need
to store the unicode code point associated with a character (which was
in turn wrongly associated to a glyph).

This commit greatly simplifies the Encoding class. Namely it:

 * Removes the unnecessary CharDescriptor class.
 * Changes the internal maps to be u8 -> FlyString and vice-versa,
   effectively providing two-way lookups.
 * Adds a new method to set a two-way u8 -> FlyString mapping and uses
   it in all possible places.
 * Simplified the creation of Encoding objects.
 * Changes how the WinAnsi special treatment for bullet points is
   implemented.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
7c42d6c737 LibPDF: Fix ZapfDingbat's char codes
The initial values were fine, but those starting at 100 were wrong: they
are all octal values, but since they were missing an initial 0 they were
interpreted as decimals.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
2f773b3c5c LibPDF: Stop storing unicode code points in Encoding
In PDF's fonts, encoding objects are used to translate bytes into fonts'
glyphs. Glyphs (in the fonts we currently support) organise their glyphs
in such a way that they are accessed by name, and thus encoding
translate between a byte sequence and a glyph name.

Note that an no point this translation includes a Unicode character, and
therefore assigning a character to a glyph in the Encoding object is the
wrong thing to do. Moreover, using the code point for this character
during the byte-sequence-to-glyph translation sequence is double-wrong.

This commit removes the characters associated to each translation in the
built-in Encoding objects. In order to keep commits short and sweet, I'm
currently simply removing the character from the enumeration, leaving
the old structure this information was held on intact. Instead, I'm
filling the "code_point" member with a zero, and filling both mappings
(which will be changed later on too) with the glyph name and the
associated char code.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
1ec4ad5eb6 LibPDF: Add name -> char code conversion in Encoding
This is an operation that was already being done (sub-optimally) in
PS1FontProgram, so we are replacing that. We will use this during CFF
parsing too.
2023-01-25 15:40:11 +01:00
Andreas Kling
d6a3be1615 LibPDF: Add missing character quirk for WinAnsiEncoding fonts
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.

This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.

I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.

This takes care of a FIXME in the Annex D part of the PDF specification.
2022-12-08 09:54:20 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
6225c03256 LibPDF: Rename argument for the latin character set enumeration macro
The previous name "V" collided with one of the entries.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
04cb00dc9a LibPDF: Fix handling of differences array in custom encodings
When looking up differences in the specified encoding, we previously
didn't recognize a lot of characters, namely those that are referred to
by a string in the PDF itself, like "/germandbls".

We now create a mapping of those characters to the code points they are
referring to, and correctly look them up when needed.
2022-09-17 10:07:14 +01:00
Matthew Olsson
8441fa2bc4 LibPDF: Add support for builtin and custom Encodings 2022-03-29 02:52:57 +02:00