Have you ever wondered about what’s in a PDF file? I don’t mean the content, but how information is actually stored in a PDF document. There is of course the PDF Reference that will explain in excruciating details how a PDF file is structured, but even after reading the whole document several times, it is still necessary to look at actual PDF content to make some sense of the now over 13MB large file.
There are a number of tools available that provide allow you to look into a PDF file:
- My personal favorite is still the Enfocus Browser v2.1, which unfortunately is no longer available. I am using it both on my Macs and my Windows machines (click on the thumbnails to see the original screen shot).
- Apple’s Voyeur application comes as a sample with their developer tools. You can find it in the directory Developer\Examples\Quartz\PDF\Voyer. Apple only provides the source code, so the application needs to be compiled before it can be used.
- WindJack’s PDF Can Opener is a Windows-only plug-in for Acrobat that does some amazing things. It is however a bit on the expensive side to just play around with it. If your day job centers around PDF, it is however one of the best tools available. WindJack has a 10 day trial version available.
Anybody who’s dabbled in Acrobat plug-in programming has probably written some form of such a tool.
With Acrobat 8, Adobe finally packaged a tool to view the PDF guts with Acrobat Professional. It’s a bit hard to find, and not well documented. Let’s see if I can help a bit…
The “Browse Internal PDF Structure” tool is hidden in the Preflight tool. Once you find it, it’s still necessary to figure out how to actually enable the menu item: Let’s load a PDF document into Acrobat and select the Preflight tool by using the “Advanced>Preflight…” menu item. This will bring up the “Preflight” dialog.
Under the “Options” menu, you can find the “Browse Internal PDF Structure”, but it’s grayed out. To enable this menu item, you first have to run a preflight check. Make sure that the profile you run does not modify the PDF document (e.g. fix potential problems with he file). This would defeat the purpose of looking at the PDF structure – what you see is not what’s in the original PDF document.
I usually run the “Compatible with Acrobat 3” check under “Acrobat/PDF version compatibility” – this profile will not modify the PDF file. Once the profile has been executed, the “Browse…” menu item will be available,
and when selected, will display the “Internal PDF Structure” window. With this tree view of the PDF structure and the PDF Reference you should be able to get a pretty good understanding of how PDF “works”.
There are a number of interesting options when displaying the page content stream, but I’ll leave that for another post.