The PDF Time Machine

Have you ever – by accident – modified a PDF file and then hit the save button before you were able to make a copy of the unmodified document? It seems like the only way back is to either restore a backup of the file, or to manually remove all modifications that were applied.

Not so fast… I may be able to offer you a time machine.

Before you do anything, make a copy of your PDF file and keep that in a safe place, then make a second copy which you will work with. Do not try to make this copy by selecting File>Save As, this may destroy any chance of going back to older versions of the document, instead, use the file manager your operating system provides (Windows Explorer or Finder) to create a copy of the file.

Now comes the crucial question that will determine if we can actually go back in time:

Did you use File>Save to save your document or File>Save As and selected the original filename (and therefore overwriting the original file)? There is a big difference between the two save mechanisms, and if you’ve only used “Save”, you may be able to recover your document.

Let’s take a look at the differences between these two ways of saving a PDF file:

File>Save will create “incremental updates”, which means that Acrobat will leave the original file as is, and will only append new or modified information to the end of the original file. There is some control information in that incremental update that allows a PDF processor to walk through all incremental updates in reverse order, and then finally arrive at the original file. This means that such a PDF file contains a record of every change since the last time a full save operation was performed. When you have a lot of incremental updates in a file, opening this file will be slower than opening a file without any incremental updates.  Acrobat used to warn the user about that and suggest to combine all these incremental updates, and flatten them. I have not seen this warning in a long time, so I assume that Acrobat is no longer using this warning.

When File>Save As is used, then the PDF file is re-written from scratch, and all incremental updates are being combined with the original PDF file. Once this is done, there is no record anymore of all the changes that have been applied to this file since the last time it was rewritten, or originally created.

We can use this knowledge to our advantage and go back in time to a previous version of the PDF file.

The following steps are very technical, and you may opt not to do this. If you do, and things go wrong, you have your backup copy and the original file, so you can start over.

You need a binary editor – this is a text editor that can modify files that contain binary data. You can for example use Notepad++ on Windows or TextMate or BBEdit on a Mac. There are other options, the key here is that the editor must not modify any data in the file on its own (e.g. replace a line ending character with what is customary on the current operating system). What I do to test if a certain application will work is to open a PDF file, and then using File>Save As to create a new version of that file. If both are identical (and both open without any errors or complains in Adobe Acrobat), the tool will work. Microsoft Word or any other word processor will not work.

Open the file and go to the end of the document. Make sure that you do not accidentally modify anything in the document.

Screenshot of the end of a PDF file loaded in a text editor

The “%%EOF” may actually not be on a line on its own – depending on what text editor you use, and how it interprets end of line characters, and how the PDF file was generated. In the sample, the last line is surrounded by “” symbols, which represent a carriage return character. The key is to find the last occurrence of “%%EOF”, and we see that in that last line.

Once you are at the end of the document, search backwards for the string “%%EOF” (which should also be the contents of the last line in your document).

2017 10 19 14 56 03

After you find the previous “%%EOF” sequence going backwards in your document, delete everything that is after that line all the way to the end of the document. Keep in mind that you may have to leave the end of line character(s) that are using in your particular PDF file after that instance of “%%EOF”.

2017 10 19 14 57 37

Save the document (preferably under a new name – e.g. fixed.pdf). Now try to open that just created document. It should contain anything that was in your document before you last saved it.

When we apply this multiple times, until we reach the last “%%EOF” in the file, we can create all versions of the PDF file that were saved by selecting File>Save. This sample PDF document contains four lines that were added in between save operations, so when we go back one incremental update at a time, we can create the file with three, two, one and zero lines of text.

Original version:

2017 10 20 09 05 52

Recovered version, one level back:

2017 10 20 09 06 25

You will need to be very careful about not accidentally changing more in the file than removing the portion after the previous “%%EOF”.

Again, this will only work if the file was saved with incremental updates (File>Save). When you do a “File>Save As”, then the file is generated from scratch, and all incremental updates will be flattened.

If this is not something you feel comfortable doing, and you are willing to make use of my professional services, please feel free to get in touch with me. My contact information is on the “About” page.

This entry was posted in Acrobat, PDF, Tutorial and tagged , , . Bookmark the permalink.

5 Responses to The PDF Time Machine

  1. daniele says:

    Hi, I’ve read his blog very much. I kindly ask you if there is a possibility to export DC reader data to this script:

    console.clear ();
    reponse = app.alert (“Exporter les donn \ u00E9es?”, 2.2)
    if (reponse == 4)
    {
    this.exportAsFDF (
    {
    AFIELDS: [ “date”, “Dropdown”, “Saisie.1.1”]
    });
    }

    This script works in Adobe Prof Dc …
    while it does not work in Reader Dc
    grazie

  2. Karl Heinz Kremer says:

    daniele, that is correct, it will work in Acrobat but not in the free Reader. When you check the quick bar in the SDK documentation, you will see that you need “Forms Rights” applied to the document, which you can only do with Adobe’s LiveCycle Server solution.

    https://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FJS_API_AcroJS%2FDoc_methods.htm%23TOC_exportAsFDFbc-26&rhtocid=_6_1_8_23_1_25

  3. Daniele says:

    Thank you for the quick response.
    One last question, If you do not buy LCE adobe, can not you authorize Dc reader to export module data?
    thank you very much

  4. Daniele says:

    thank you very much

  5. Karl Heinz Kremer says:

    No, you need LC for Reader Extensions to do that. Acrobat can add some extended rights, but not the “Forms” rights.

Leave a Reply

Your email address will not be published. Required fields are marked *