More Missing Characters

A while ago I wrote about missing characters after merging PDF files. Since then, I’ve heard about a few more instances of missing characters and was able to debug one scenario. The following is a record of that. Unfortunately, there is no simple fix for this problem, but more about that at the end.

The symptoms of this problem are a bit different than the old case: Here, the problem is that the documents look good when displayed in Adobe Acrobat or Adobe Reader, but once printed, characters are missing. The first screenshot is from the document displayed in a PDF viewer, the second one shows what the document looked like after being printed:

PDF Displayed in PDF Viewer

PDF After Printing

The second line in the printed output clearly is missing characters at the end.

I had two reports about this problem, the common things were that both were using a Mac, and both were printing to network printers. The printers were from three different manufacturers, so it had to be something that crosses vendor boundaries.

A while ago, Apple introduced AirPrint, a mechanism to print from an iOS device without a driver to an AirPrint compatible printer. AirPrint uses the IPP protocol and the URF format. To make things simpler for printer manufacturers, the same mechanism is oftentimes also used for printing outside of the AirPrint environment, e.g. when printing from a Mac to a network connected printer. And that is exactly where we are running into problems with these jobs.

Mac OS X uses the CUPS system as it’s print sub-system. Apple actually purchased the rights to CUPS back in 2007. By now, what’s in Mac OS X is not pure CUPS (which is still open source software), but a proprietary system built on top of CUPS. This makes it a bit harder to figure out what’s going on.

Here is what happens when a PDF document is printed from Adobe Acrobat (or Adobe Reader): Acrobat (or Reader) converts the PDF file to PostScript. This PostScript file is then handed over to the CUPS system. And here, we can use the logging capabilities of CUPS to get information about what happens next: Logging is enabled by changing one line in the configuration file /etc/cups/cupsd.conf – change “LogLevel warn” to “LogLevel debug”. This gives us the following in the log file:

D [27/Oct/2015:11:39:24 -0400] [Job 112] 4 filters for job:
D [27/Oct/2015:11:39:24 -0400] [Job 112] pstoappleps (application/postscript to application/vnd.apple-postscript, cost 10)
D [27/Oct/2015:11:39:24 -0400] [Job 112] pstocupsraster (application/vnd.apple-postscript to image/urf, cost 250)
D [27/Oct/2015:11:39:24 -0400] [Job 112] - (image/urf to printer/EPSON_WF_3520_Series/image/urf, cost 10)
D [27/Oct/2015:11:39:24 -0400] [Job 112] - (printer/EPSON_WF_3520_Series/image/urf to printer/EPSON_WF_3520_Series, cost 0)
D [27/Oct/2015:11:39:24 -0400] [Job 112] job-sheets=none,none
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[0]="EPSON_WF_3520_Series"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[1]="112"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[2]="khk"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[3]="parts_combined.pdf"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[4]="1"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[5]="AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=parts_combined.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:68eacf38-6a65-3c4f-7a67-3e90cb0c7bc3 job-originating-host-name=localhost time-at-creation=1445960364 time-at-processing=1445960364 PageSize=Letter"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[6]="/private/var/spool/cups/d00112-001"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[0]=""
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[1]="CUPS_CACHEDIR=/private/var/spool/cups/cache"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[2]="CUPS_DATADIR=/usr/share/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[3]="CUPS_DOCROOT=/usr/share/doc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[4]="CUPS_FONTPATH=/usr/share/cups/fonts"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[5]="CUPS_REQUESTROOT=/private/var/spool/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[6]="CUPS_SERVERBIN=/usr/libexec/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[7]="CUPS_SERVERROOT=/private/etc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[8]="CUPS_STATEDIR=/private/etc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[9]="HOME=/private/var/spool/cups/tmp"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[10]="PATH=/usr/libexec/cups/filter:/usr/bin:/usr/sbin:/bin:/usr/bin"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[11]="SERVER_ADMIN=root@MacBookPro2013.local"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[12]="SOFTWARE=CUPS/2.0.0"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[13]="TMPDIR=/private/var/spool/cups/tmp"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[14]="USER=root"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[15]="CUPS_MAX_MESSAGE=2047"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[16]="CUPS_SERVER=/private/var/run/cupsd"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[17]="CUPS_ENCRYPTION=IfRequested"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[18]="IPP_PORT=631"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[19]="CHARSET=utf-8"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[20]="LANG=en_US.UTF-8"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[21]="APPLE_LANGUAGE=en-US"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[22]="PPD=/private/etc/cups/ppd/EPSON_WF_3520_Series.ppd"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[23]="RIP_MAX_CACHE=128m"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[24]="CONTENT_TYPE=application/postscript"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[25]="DEVICE_URI=file:/tmp/epson.out"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[26]="PRINTER_INFO=EPSON WF-3520 Series (to file)"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[27]="PRINTER_LOCATION="
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[28]="PRINTER=EPSON_WF_3520_Series"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[29]="PRINTER_STATE_REASONS=none"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[30]="CUPS_FILETYPE=document"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[31]="FINAL_CONTENT_TYPE=image/urf"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[32]="AUTH_I****"
I [27/Oct/2015:11:39:24 -0400] [Job 112] Started filter /usr/libexec/cups/filter/pstoappleps (PID 81749)
I [27/Oct/2015:11:39:24 -0400] [Job 112] Started filter /usr/libexec/cups/filter/pstocupsraster (PID 81750)

With this information from the log file, we know everything we need to know to re-create this process. The following commands will run the same commands that are executed automatically when the print button is used (after capturing the PostScript file that Acrobat created from the spool area):

#!/bin/bash
export PPD=/private/etc/cups/ppd/EPSON_WF_3520_Series.ppd
/usr/libexec/cups/filter/pstoappleps 106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.ps  > sample.eps 
/usr/libexec/cups/filter/pstocupsraster  106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.ps > sample.raster 
/usr/libexec/cups/filter/rastertourf  106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.raster > sample.urf

So, what’s going on? We start out with a PostScript file and run it through pstoappleps, which seems to “normalize” the PostScript to something that Apple’s other utilities can work with. The next step is pstocupsraster, which is the actual PostScript interpreter that converts PostScript to the raster image, which is where the problem occurs. And the last step is rastertourf, which takes the raster image and wraps it into the URF format.

At the end of this script we end up with a URF file. In order to actually see what’s in the file, I convert that URF file to TIFF using the urftotiff utility from the urf_work project. We could also use Michael Sweet’s RasterView utility and look at the raster file before it gets converted to URF.

I can run this tool chain on many different PDF files without ever seeing the problem of missing characters, so what exactly triggers it? After analyzing a few files that end up with missing characters, it turns out that only text rendered with CID fonts is affected. At one point in time, it was pretty easy to create such files: Older versions of Adobe InDesign created PDF files with embedded CID fonts on a regular basis, and caused problems with non-Adobe RIPs in print workflows. Now it’s actually pretty complicated to end up with a CID font in a PDF file exported from InDesign. Interestingly enough, one of the test files I had access to was an InDesign file. I did however find a tool that always creates CID fonts in PDF files: wkhtmltopdf

I created a set of HTML files that I ran through wkhtmltopdf, and then merged in Adobe Acrobat into one PDF. Such a file, when printed to any printer on Mac OS that uses the tools mentioned above to convert PostScript to a raster format, will very likely show missing characters. My theory at this time is that when a certain CID font is subset embedded on multiple pages with different subsets, the Mac OS X PostScript to URF toolchain will not correctly interpret the subsets on each page – somehow it gets “stuck” with one subset, and if a page uses characters that are not represented in that subset, we end up with missing characters.

Here are the files that I’ve used: part1.html part2.html part3.html

Here are the individual PDF files and the merged file: part1.pdf part2.pdf part3.pdf merged.pdf

And finally, here is the PostScript file that Acrobat creates: sample.ps

Now that we know what causes this problem, is there anything we can do about it? Unfortunately, because most of the functionality is in Mac OS X proprietary files, we cannot fix the problem. All we can do is work around it by printing “as Image”, and wait for Apple to fix it on their end. I’ve tested this on both Mac OS X 10.10 (Yosemite) and 10.11 (El Capitan) with the same results.

This entry was posted in Acrobat, Apple, PDF and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *