Sfoglia il codice sorgente

LibPDF: Tolerate trailing whitespace after %%EOF marker

At first I tried implmenting the quirk from PDF 1.7 Appendix H,
3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF
marker appear somewhere within the last 1024 bytes of the file.""
This would've been like #22548 but at end-of-file instead of at
start-of-file.

This helped a bunch of files, but also broke a bunch of files that
made more than 1024 bytes of stuff at the end, and it wouldn't have
helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF.
So just tolerate whitespace after the %%EOF line, and keep ignoring
and arbitrary amount of other stuff after that like before.

This helps:
* 0000599.pdf
  One trailing \0 byte after %%EOF. Due to that byte, the
  is_linearized() check fails and we go down the non-linearized
  codepath. But with this fix, that code path succeeds.
* 0000937.pdf
  Same.
* 0000055.pdf
  Has one space followed by a \n after %%EOF
* 0000059.pdf
  Has over 40kB of trailing \0 bytes

The following files keep working with it:
* 0000242.pdf
  5586 bytes of trailing HTML
* 0000336.pdf
  5586 bytes of trailing HTML fragment
* 0000136.pdf
  2054 bytes of trailing space characters
  This one kind of only worked by accident before since it found
  the %%EOF block before the final %%EOF block. Maybe this is
  even an intentional XRefStm compat hack? Anyways, now it
  find the final block instead.
* 0000327.pdf
  11044 bytes of trailing HTML
Nico Weber 1 anno fa
parent
commit
9d69c5d434
1 ha cambiato i file con 1 aggiunte e 0 eliminazioni
  1. 1 0
      Userland/Libraries/LibPDF/DocumentParser.cpp

+ 1 - 0
Userland/Libraries/LibPDF/DocumentParser.cpp

@@ -726,6 +726,7 @@ bool DocumentParser::navigate_to_before_eof_marker()
 
 
     while (!m_reader.done()) {
     while (!m_reader.done()) {
         m_reader.consume_eol();
         m_reader.consume_eol();
+        m_reader.consume_whitespace();
         if (m_reader.matches("%%EOF")) {
         if (m_reader.matches("%%EOF")) {
             m_reader.move_by(5);
             m_reader.move_by(5);
             return true;
             return true;