Selective Flattening of Form Fields Using ABCpdf

Let’s assume you have a form with a lot of form fields, and somewhere during your forms workflow, you want to flatten some fields to “burn in” the information and convert it from an interactive field to static PDF content. There are different ways to accomplish this. One option is to do this all in Acrobat’s JavaScript. Unfortunately, when you lookup the documentation for the Doc.flattenPages() method, it does not talk about selecting only a subset of fields (unless these fields are all on one specific page, and that page does not contain any fields that should not be flattened). But, there is an optional parameter that allows us to specify what should happen to fields that are set to only be displayed in the viewer, but not be printed: The “nNonPrint” parameter, when set to the numeric value “1”, will leave
these fields alone, and will not flatten them (a value of “0” will flatten them and a value of “2” will remove them from the document).

This allows us to modify the document before we call the Doc.flattenPages() method: If we turn all fields that should not be flattened into non-printing fields, then flatten the document, and then in a last step reverse that first change, we can do flatten only a subset of fields. The problem gets a bit more complex when we already have non-printing fields in the document that we want to remain non-printing, or if we have non-printing fields that we want to flatten. That will require quite a bit of logic to save the original state of all fields, turn those fields we want to flatten into printing form fields, turn the fields that we don’t want to flatten into non-printing fields, flatten the document, and then restore the original settings again. As I said, a bit complex.

As long as we don’t need to do this within Acrobat, there is another solution: The ABCpdf library that I’ve mentioned before provides functionality to flatten fields on a per-field basis. [ Full disclosure: The fine folks at webSupergoo provided me with a free license to ABCpdf 8 based on that old blog post. That’s what I am still using for my experiments. They are up to version 10 by now, and based on their feature comparison chart, there are a few new features in this version that sound interesting. ]

Now, when you search for “flattening” in the ABCpdf manual, you won’t find anything that will help you in this case: The only flattening I found in my version is for flattening layers, and versions 9 and 10 support transparency flattening. Nothing about form fields. It took me a bit of head scratching and exploring the Form and Field APIs to figure out that form or field flattening is actually done via a process called “stamping”. Once that hurdle was cleared, it was pretty simple to come with a routine to flatten just one field at a time. Here is the VB code I’ve used (the same will also work in C# or via any of the other interfaces that ABCpdf supports):

' This assumes that we already have an open document 'theDoc'
Dim theField As Field

' set the field 'Text1' to some value
theField = theDoc.Form.Fields("Text1")
theField.Value = "Some value"

' flatten one form field
theField.Stamp()

' flatten all form fields in the document
theDoc.Form.Stamp()

Just to demonstrate how to flatten a whole document, I’ve added that as the last line in this snippet.

This makes it very simple to e.g. define an array of field names to be flattened, and then just loop over that array, get the field, and flatten one at a time.

This feature in ABCpdf is very useful, and easy to use – even though it has a name that is a bit confusing to somebody who works with a different kind of PDF Stamps on a regular basis 🙂

Posted in PDF, Programming | Tagged , , , | Leave a comment

Batch-Import List Data into PDF Form

We’ve talked about batch importing data from an Excel document into a PDF form before: http://khkonsulting.com/2015/10/batch-import-excel-data-into-pdf-forms/

Back then, the idea was that we would have a spreadsheet with rows of data, and every row would be imported into a new copy of the document (e.g. a mail merge type application).

What if we want to import rows of data into one document? Let’s assume, we have a table in our document with 20 rows, and we have a spreadsheet that has the same 20 rows, and we want to import that data. If we would use the method from above, the whole form data would have to be rewritten into a spreadsheet with just one row of data. Let’s do a simple example with a small table:

Firstname Lastname
John Doe
Jane Doe

The data in the PDF file would be organized like this:

Firstname_1 Lastname_1
Firstname_2 Lastname_2

Our datafile would then look like this:

Firstname_1	Lastname_1	Firstname_2	Lastname_2
John	Doe	Jane	Doe

The blank spaces between the entries in the list above are TAB characters – remember, we need a tab separated text file for this to work.

This is simple to do for a small files, but if you are dealing with 100 records of 10 fields each, we are talking about a pretty extensive row of data. It can certainly be done, especially if a VBA macro is used, but it’s not what I would do.

The good news is that this can still be done using JavaScript. The trick is to work with two documents: We of course start out with the document that we want to populate with data, but in addition to that, we are also using a temporary document that has one set of fields for each record we are going to read. This way, we can read one record at a time, and then copy that record into our final form. This temporary form will be created on the fly using JavaScript. And, because we will never actually look at that form, we don’t have to worry about placing the form fields in any meaningful way. In my example further down, I am placing all fields right on top of each other.

Let’s assume we want to create a sign-in sheet for an event, and we want to print the full name of each participant and the participant’s email address on the form:

Blank form

You can download the documents from these links:

Blank form
Form with fields

The following script can for example be used as an Action, or a custom command in Acrobat DC:

/* Import list data from tab delimited text file */

var dataFile = "/Users/username/data.txt"; // !!! CHANGE THIS !!!

// Create a temporary document and add the three text fields
// that correspond with the three columns in our data file.
var tmpDoc = app.newDoc();
tmpDoc.addField("Firstname", "text", 0, [0, 0, 100, 100]);
tmpDoc.addField("Lastname", "text", 0, [0, 0, 100, 100]);
tmpDoc.addField("Email", "text", 0, [0, 0, 100, 100]);

// Iterate over the data file and import the corresponding record
// from the data file and then copy that data to the corresponding
// data row.
var err = 0;
var idx = 0;
while (err == 0) {
	err = tmpDoc.importTextData(dataFile, idx); // imports the next record

	if (err == -1)
		app.alert("Error: Cannot Open File");
	else if (err == -2)
		app.alert("Error: Cannot Load Data");
	else if (err == 1)
		app.alert("Warning: Missing Data");
	else if (err == 2)
		app.alert("Warning: User Cancelled Row Select");
	else if (err == 3)
		app.alert("Warning: User Cancelled File Select");
	else if (err == 0) {
		// collect the data and add it to the "real" form
		var name = tmpDoc.getField("Lastname").value + ",\n" +
			tmpDoc.getField("Firstname").value;
		var email = tmpDoc.getField("Email").value;

		this.getField("Name" + (idx + 1)).value = name; // we need to adjust the index by one
		this.getField("Email" + (idx + 1)).value = email;
		if (idx == 19) {
			// we can only process 20 records on each sheet
			err = -99;
		}
	}
	idx++;
}

// cleanup
tmpDoc.closeDoc(true);

Here is the sample file I’ve used as Excel Spreadsheet and as tab delimited text file (this is random data thanks to the “Fake Name Generator“.

There are a few potential problems when you are using this approach: First of all, the same limitations that apply to manually importing data apply here as well: You need to make sure that each column in your data file is represented by a field in your temporary file, and that the names match. There cannot be any extra fields either. And, in this particular case, you will have to make sure that you only import up to the same number of records that your document can actually handle. My document uses 20 records, so I am checking for that in my script. If you need continuation pages, you can certainly do that, it would make the script a bit more complex.

If you want to adapt this approach for your own solution, make sure that you add all required fields to your temporary document. In the example above, I am only using text fields, but the same technique can be used for other field types as well.

Let me know if this works for you.

Posted in Acrobat, JavaScript, Tutorial | Tagged , , , , , , | 13 Comments

Using Custom Dynamic PDF Stamps in Adobe Acrobat or Adobe Reader

There is sometimes a misconception about how a custom dynamic stamp works in the PDF environment. Here is a short video that demonstrates how such a stamp gets applied:

As you can see, the custom dialog that collects the data that will be placed as part of the stamp pops up after the stamp is placed. Once all information is entered, and the “OK” button (how that button is labeled is up to the stamp’s implementation) is pressed, that data gets merged into the stamp and the stamp gets displayed. From that point on, that data can no longer be changed. It’s like placing a rubber stamp on a piece of paper. Once it’s there, it can no longer be modified. There are however a few things we can do with a PDF stamp: We can move the stamp, resize and rotate it – as shown in the video.

In the second part of the video, I am demonstrating how to use the “Stamp” tool button on the toolbar. In order to have this button available, you will need to add it to your toolbar. For more information about how to customize the Acrobat user interface, see for example here: http://blogs.adobe.com/acrolaw/2013/03/customizing-the-acrobat-xi-interface/

This article is not supposed to be a “tell all” story about dynamic stamps, I just wanted to demonstrate the basic workflow involved in placing a custom dynamic stamp. If you have questions about these types of stamps, please post in the comments.

Posted in Acrobat, PDF | Tagged , , , , , | 7 Comments

More Missing Characters

A while ago I wrote about missing characters after merging PDF files. Since then, I’ve heard about a few more instances of missing characters and was able to debug one scenario. The following is a record of that. Unfortunately, there is no simple fix for this problem, but more about that at the end.

The symptoms of this problem are a bit different than the old case: Here, the problem is that the documents look good when displayed in Adobe Acrobat or Adobe Reader, but once printed, characters are missing. The first screenshot is from the document displayed in a PDF viewer, the second one shows what the document looked like after being printed:

PDF Displayed in PDF Viewer

PDF After Printing

The second line in the printed output clearly is missing characters at the end.

I had two reports about this problem, the common things were that both were using a Mac, and both were printing to network printers. The printers were from three different manufacturers, so it had to be something that crosses vendor boundaries.

A while ago, Apple introduced AirPrint, a mechanism to print from an iOS device without a driver to an AirPrint compatible printer. AirPrint uses the IPP protocol and the URF format. To make things simpler for printer manufacturers, the same mechanism is oftentimes also used for printing outside of the AirPrint environment, e.g. when printing from a Mac to a network connected printer. And that is exactly where we are running into problems with these jobs.

Mac OS X uses the CUPS system as it’s print sub-system. Apple actually purchased the the rights to CUPS back in 2007. By now, what’s in Mac OS X is not pure CUPS (which is still open source software), but a proprietary system built on top of CUPS. This makes it a bit harder to figure out what’s going on.

Here is what happens when a PDF document is printed from Adobe Acrobat (or Adobe Reader): Acrobat (or Reader) converts the PDF file to PostScript. This PostScript file is then handed over to the CUPS system. And here, we can use the logging capabilities of CUPS to get information about what happens next: Logging is enabled by changing one line in the configuration file /etc/cups/cupsd.conf – change “LogLevel warn” to “LogLevel debug”. This gives us the following in the log file:

D [27/Oct/2015:11:39:24 -0400] [Job 112] 4 filters for job:
D [27/Oct/2015:11:39:24 -0400] [Job 112] pstoappleps (application/postscript to application/vnd.apple-postscript, cost 10)
D [27/Oct/2015:11:39:24 -0400] [Job 112] pstocupsraster (application/vnd.apple-postscript to image/urf, cost 250)
D [27/Oct/2015:11:39:24 -0400] [Job 112] - (image/urf to printer/EPSON_WF_3520_Series/image/urf, cost 10)
D [27/Oct/2015:11:39:24 -0400] [Job 112] - (printer/EPSON_WF_3520_Series/image/urf to printer/EPSON_WF_3520_Series, cost 0)
D [27/Oct/2015:11:39:24 -0400] [Job 112] job-sheets=none,none
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[0]="EPSON_WF_3520_Series"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[1]="112"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[2]="khk"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[3]="parts_combined.pdf"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[4]="1"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[5]="AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=parts_combined.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:68eacf38-6a65-3c4f-7a67-3e90cb0c7bc3 job-originating-host-name=localhost time-at-creation=1445960364 time-at-processing=1445960364 PageSize=Letter"
D [27/Oct/2015:11:39:24 -0400] [Job 112] argv[6]="/private/var/spool/cups/d00112-001"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[0]=""
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[1]="CUPS_CACHEDIR=/private/var/spool/cups/cache"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[2]="CUPS_DATADIR=/usr/share/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[3]="CUPS_DOCROOT=/usr/share/doc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[4]="CUPS_FONTPATH=/usr/share/cups/fonts"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[5]="CUPS_REQUESTROOT=/private/var/spool/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[6]="CUPS_SERVERBIN=/usr/libexec/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[7]="CUPS_SERVERROOT=/private/etc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[8]="CUPS_STATEDIR=/private/etc/cups"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[9]="HOME=/private/var/spool/cups/tmp"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[10]="PATH=/usr/libexec/cups/filter:/usr/bin:/usr/sbin:/bin:/usr/bin"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[11]="SERVER_ADMIN=root@MacBookPro2013.local"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[12]="SOFTWARE=CUPS/2.0.0"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[13]="TMPDIR=/private/var/spool/cups/tmp"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[14]="USER=root"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[15]="CUPS_MAX_MESSAGE=2047"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[16]="CUPS_SERVER=/private/var/run/cupsd"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[17]="CUPS_ENCRYPTION=IfRequested"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[18]="IPP_PORT=631"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[19]="CHARSET=utf-8"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[20]="LANG=en_US.UTF-8"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[21]="APPLE_LANGUAGE=en-US"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[22]="PPD=/private/etc/cups/ppd/EPSON_WF_3520_Series.ppd"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[23]="RIP_MAX_CACHE=128m"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[24]="CONTENT_TYPE=application/postscript"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[25]="DEVICE_URI=file:/tmp/epson.out"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[26]="PRINTER_INFO=EPSON WF-3520 Series (to file)"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[27]="PRINTER_LOCATION="
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[28]="PRINTER=EPSON_WF_3520_Series"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[29]="PRINTER_STATE_REASONS=none"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[30]="CUPS_FILETYPE=document"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[31]="FINAL_CONTENT_TYPE=image/urf"
D [27/Oct/2015:11:39:24 -0400] [Job 112] envp[32]="AUTH_I****"
I [27/Oct/2015:11:39:24 -0400] [Job 112] Started filter /usr/libexec/cups/filter/pstoappleps (PID 81749)
I [27/Oct/2015:11:39:24 -0400] [Job 112] Started filter /usr/libexec/cups/filter/pstocupsraster (PID 81750)

With this information from the log file, we know everthing we need to know to re-create this process. The following commands will run the same commands that are executed automatically when the print button is used (after capturing the PostScript file that Acrobat created from the spool area):

#!/bin/bash
export PPD=/private/etc/cups/ppd/EPSON_WF_3520_Series.ppd
/usr/libexec/cups/filter/pstoappleps 106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.ps  > sample.eps 
/usr/libexec/cups/filter/pstocupsraster  106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.ps > sample.raster 
/usr/libexec/cups/filter/rastertourf  106 khk sample.pdf 1 "AP_ColorMatchingMode=AP_ApplicationColorMatching AP_D_InputSlot= nocollate com.apple.print.DocumentTicket.PMSpoolFormat=application/pdf com.apple.print.JobInfo.PMJobName=sample.pdf com.apple.print.PrinterInfo.PMColorDeviceID..n.=18333 com.apple.print.PrintSettings.PMCopies..n.=1 com.apple.print.PrintSettings.PMCopyCollate..b. com.apple.print.PrintSettings.PMFirstPage..n.=1 com.apple.print.PrintSettings.PMLastPage..n.=2147483647 com.apple.print.PrintSettings.PMPageRange..a.0..n.=1 com.apple.print.PrintSettings.PMPageRange..a.1..n.=2147483647 media=Letter pserrorhandler-requested=standard job-uuid=urn:uuid:80d6bf80-1651-3095-51f5-185048379779 job-originating-host-name=localhost time-at-creation=1445884058 time-at-processing=1445884058 PageSize=Letter"  sample.raster > sample.urf

So, what’s going on? We start out with a PostScript file and run it through pstoappleps, which seems to “normalize” the PostScript to something that Apple’s other utilities can work with. The next step is pstocupsraster, which is the actual PostScript interpreter that converts PostScript to the raster image, which is where the problem occurs. And the last step is rastertourf, which takes the raster image and wraps it into the URF format.

At the end of this script we end up with a URF file. In order to actually see what’s in the file, I convert that URF file to TIFF using the urftotiff utility from the urf_work project. We could also use Michael Sweet’s RasterView utility and look at the raster file before it gets converted to URF.

I can run this tool chain on many different PDF files without ever seeing the problem of missing characters, so what exactly triggers it? After analyzing a few files that end up with missing characters, it turns out that only text rendered with CID fonts is affected. At one point in time, it was pretty easy to create such files: Older versions of Adobe InDesign created PDF files with embedded CID fonts on a regular basis, and caused problems with non-Adobe RIPs in print workflows. Now it’s actually pretty complicated to end up with a CID font in a PDF file exported from InDesign. Interestingly enough, one of the test files I had access to was an InDesign file. I did however find a tool that always creates CID fonts in PDF files: wkhtmltopdf

I created a set of HTML files that I ran through wkhtmltopdf, and then merged in Adobe Acrobat into one PDF. Such a file, when printed to any printer on Mac OS that uses the tools mentioned above to convert PostScript to a raster format, will very likely show missing characters. My theory at this time is that when a certain CID font is subset embedded on multiple pages with different subsets, the Mac OS X PostScript to URF toolchain will not correctly interpret the subsets on each page – somehow it gets “stuck” with one subset, and if a page uses characters that are not represented in that subset, we end up with missing characters.

Here are the files that I’ve used: part1.html part2.html part3.html

Here are the individual PDF files and the merged file: part1.pdf part2.pdf part3.pdf merged.pdf

And finally, here is the PostScript file that Acrobat creates: sample.ps

Now that we know what causes this problem, is there anything we can do about it? Unfortunately, because most of the functionality is in Mac OS X proprietary files, we cannot fix the problem. All we can do is work around it by printing “as Image”, and wait for Apple to fix it on their end. I’ve tested this on both Mac OS X 10.10 (Yosemite) and 10.11 (El Capitan) with the same results.

Posted in Acrobat, Apple, PDF | Tagged , , , , | Leave a comment

Share Documents via Adobe’s Document Cloud

As many of you know, I am very active on the Adobe Acrobat User Community (http://www.acrobatusers.com). Oftentimes it’s necessary to actually see a PDF file in order to help somebody with a problem. There is no file sharing mechanism built into AcrobatUsers, which means that in order to share a file, it is necessary to use some other sharing service. One such option is Adobe’s own Document Cloud. The following shows how to upload a document to the Document Cloud, and to share it.

When you sign up for your Adobe ID, you get a Document Cloud account that comes with storage space. If you have a subscription to Acrobat DC, you get 20GB of storage, with a free account that is not connected to an Acrobat DC subscription, you get 5GB. This means you have 5GB of storage space available without paying a dime to Adobe.

To share a file, log into the Document Cloud at http://cloud.acrobat.com (1):

2015 10 22 12 14 00

Once logged in, go to the “File” tab (2) and switch to the “Document Cloud” category (3). You can now upload a file (4) by e.g. dragging and dropping the file into the upload area.

Once a file is uploaded, select that file (1):

2015 10 22 12 17 26

This will enable a panel on the right side that has (after scrolling a bit) the option to “Send & Track” (2). Sharing in the Document Cloud environment is called to send a file.

After selecting to send a file, you get the option to either share it via an anonymous link (with very limited tracking capabilities), or to send a personalized invitation with more detailed tracking. For now – because we want to share the link publicly – we select the anonymous link (1) and then click on the “Send” button (2).

2015 10 22 12 18 34

This will bring up a popup dialog with the link that allows people to access the file we just uploaded:

2015 10 22 17 24 27

If you want to share a document with an individual or a small group of people, select to send via the “Personalized Invitation”, but that requires a subscription – the cheapest one would be the “Send & Track” subscription for currently $19.99/year.

Posted in Acrobat, PDF, Tutorial | Tagged , , , | 2 Comments

Batch-Import Excel Data into PDF Forms

A while ago I documented for AcrobatUsers.com how to manually import an Excel data record into a PDF form. You can find this information here: Can I import data from an Excel spreadsheet to a fillable PDF Form?

This is very useful if you only have to deal with one or a few records that you need to import into PDF forms, but what if we are talking about 10s or 100s of records? It gets a bit boring to click on the same buttons again and again. There must be a way to automate this…

And, there is. The following gives you an idea about how to do this using JavaScript.

Anything I said about importing data manually in the link above is still true, so get familiar with the manual process and verify that you can actually import one data record from your text file into your PDF form. If that does not work, trying to automate this step will also fail.

The key to importing data from an Excel file is that you need to export the data as a “tab delimited text file” – just like described in the AcrobatUsers.com question I linked to above. Once you have such a file, you can use the Acrobat JavaScript method Doc.importTextData() to import one record at a time (just like we did manually before). Take a look at the documentation for this method: Doc.importTextData

There is a problem with this page from the documentation: The error codes use the wrong sign: All positive values are supposed to be negative and vice versa.

To import a whole spread sheet of data, we need to call this method for each record, and then save the file under a new name, an then move to the next record. This can be done e.g. in an Action. You can use the following script in an Action (or a custom command in Acrobat DC):

// specify the filename of the data file
var fileName = "/Users/username/tmp/data.txt";	// the tab delimited text file containing the data
var outputDir = "/Users/username/tmp/";    // make sure this ends with a '/'

var err = 0;
var idx = 0;
while (err == 0) {
	err = this.importTextData(fileName, idx);	// imports the next record

	if (err == -1)
		app.alert("Error: Cannot Open File");
	else if (err == -2) 
		app.alert("Error: Cannot Load Data");
    else if (err == 1)
        app.alert("Warning: Missing Data");
    else if (err == 2)
        app.alert("Warning: User Cancelled Row Select");
    else if (err == 3)
        app.alert("Warning: User Cancelled File Select");
	else if (err == 0) {
		this.saveAs(outputDir + this.getField("Text1").value + "_" + this.getField("Text2").value + ".pdf"); // saves the file
		idx++;
	}
}

There are two lines that actually do something: The line that is marked with ‘imports the next record’ is the one line that reads the record with the index “idx” from the file with the fielname “fileName”. And, the line with “saves the file” will save the open file under a new filename. You can get creative and use elements from the form to craete your new filename.

The only thing that’s a bit complex is the file and directory names in this script: Acrobat’s JavaScript uses “device independent paths”. What I’ve used in this script are paths on a Mac, if you are running Windows, the paths may look like this:

var fileName = "/c/temp/data.txt";
var outputDir = "/c/temp/output/";

A path of e.g. “C:\temp” gets converted to “/c/temp”. You can read up on device independent paths in the PDF specification.

This should get you started. If you have questions, as usual, post them in the comments.

Posted in Acrobat, JavaScript, PDF, Tutorial | Tagged , , , , , , | 113 Comments

How to Open a Document Attached to a PDF File

Have you ever wanted to attach a document (e.g. a MS Word file) to a PDF document, and give the user the ability to launch that file with just a click on a button?

Usually, you have to save the attachment to a file, remember where you saved it, then go to that location and open the file using your Windows Explorer or the Finder on a Mac. With the solution I am about to present, that gets much easier to do for the user, but a bit more complex for the author of the PDF file.

Let’s assume we have two files, one PDF file named document.pdf and a MS Word document named attachment.docx – in the following you just have to replace your filenames with the ones that I am using.

I am using Acrobat DC Pro (running on a Mac) for the following instructions, this will work the same way (with slightly different tool names and a different user interface with older versions of Acrobat as well). You will need Adobe Acrobat – either Standard or Pro – for this, the free Reader is not able to create such documents.

Open your PDF document and go to the “Attachments” pane on the left side of the Acrobat user interface. The “Attachments” pane is represented by the paper clip icon:

2015 10 14 09 24 53

If you don’t see this pane, select the following menu to show it:

2015 10 14 09 26 24

Once the “Attachments” pane is displayed, click on the menu icon as indicated in the following screenshot, and select to add an attachment:

2015 10 14 09 27 37

Now navigate to the file you want to attach, select it and click “OK”. This should now show you the new attachment in the “Attachments” pane:

2015 10 14 09 32 37

So far we have not done anything special – this is just the process to attach a file to a PDF document. We could stop here and let the user figure out that there is actually an attachment in the PDF file, and how to open it. But that’s not what we set out to do, we want to make it obvious for the user how to display this attached document.

To do that, we need to add a button to the PDF document. There are two ways to do that: We can either open the form editor and then add a button (and deal with everything that comes with actually being in the form editor), or we can just add an interactive button. If this document is a form that contains other form fields, we of course would use the button function in the form editor, but if this is just one button we need to add, there is an easier way.

Just in case you are not yet familiar with the Acrobat DC user interface, it can be hard to find the one function you are looking for. That is why Adobe added a tool search function to Acrobat DC. There are two ways to search for tools: You can either use the search field at the top of the “Right Hand Pane” (or RHP for short), or you can select the “Tools” tab and then use the search field at the top of that dialog:

2015 10 14 09 37 05

2015 10 14 09 37 29

When we type in “Button” in that search field, Acrobat will tell us where the button tool is:

2015 10 14 09 40 26

This actually gives us two tools that we need: The tool to add a button, and the tool to select that button again and to modify it (if we need to make adjustments).

For now, just click on the “Add Button” search result. This dumps us right into the “Rich Media” toolset, with the Button tool selected. This means we can now place the button on the PDF page by moving it around to the correct location and then clicking to place it.

2015 10 14 09 42 28

At this time, the button tool is still selected, and we can double-click on the button to bring up it’s Properties dialog. This is where we need to make changes to give this button the ability to launch the attached Word document.

2015 10 14 09 45 42

Select the “Actions” tab (1), then select to create a “Mouse Up” action (2), select to run a JavaScript (3) and click on the “Add” button (4). This will bring up the JavaScript editor. Here we have to add a one line script.

This script will call the Doc.exportDataObject() method. You can find more information about this JavaScript method here: Acrobat JavaScript API – Doc.exportDataObject()

The trick here is to use the “nLaunch” parameter set to the value “2”, which has the following descrption:

  • If the value is 2, the file will be saved and then launched. Launching will prompt the user with a security alert warning if the file is not a PDF file. A temporary path is used, and the user will not be prompted for a save path. The temporary file that is created will be deleted by Acrobat upon application shutdown.

The command we are using also needs to reference the attachment name, which in our case is the filename we’ve originally imported:

this.exportDataObject({ cName: "attachment.docx", nLaunch: 2 });

Now close the editor by clicking “OK”.

A button is not very useful without a label (or an icon) on it. You can make these changes on the “Options” tab of the “Properties” dialog. After that, close the “Properties” dialog using the “Close” button and close the “Rich Media” toolset by clicking on the “x” on the right side of the toolbar.

When you now click on the button, you should see the following security dialog:

2015 10 14 10 06 17

This is it. You have a button that opens a Word document.

If you are trying to launch an attachment that is already in the PDF document, the filename you need to use for the JavaScript code is the name that is being displayed in the “Attachments” pane.

If you want to edit the button properties again, bring up the tool search function again, search for “button”, and now select the “Select Object” tool. The button should change it’s look, and you can now double-click on it to bring up the properties dialog again. Or, because now we know that the button is in the “Rich Media” toolset, we can go to the “Tools” tab and select this toolset directly:

2015 10 14 10 13 30

After the third time you’ve selected this tool, you probably wish that there would be an easier method of selecting the “Rich Media” toolset, and there is: You can drag&drop the toolset icon into the RHP:

2015 10 14 10 13 55

From now on, this toolset is just one button click away:

2015 10 14 10 14 21

You can download a PDF file with an embedded Word document form here: launchAttachment.pdf

Posted in Acrobat, JavaScript, PDF, Programming, Tutorial | Tagged , , , , , , | 11 Comments

Apply Standard PDF Form FIeld Formatting/Keystroke/Validation Events to Fields via JavaScript

For some options in Acrobat’s form editor, you can select multiple fields and then apply the same option to all selected fields. This works for example for the “read-only” flag, or the display options. It does however not work for things like formatting/keystroke/validation/calculation scripts.

It’s relatively easy to assign e.g. a custom validation script to many form fields in a PDF form via JavaScript. I do that all the time to cut down on manually editing fields. The following script validates that a text field contains data in a specific format:

Let’s assume we want to use the following script for all fields that contain a product number in a specific format, three upper case characters followed by a dash and three or four digits:

var re = /^[A-Z]{3}-[0-9]{3,4}$/;

if (event.value != "") {
    event.rc = re.test(event.value);
}

To assign this to all fields that have a name that starts with “product.” (e.g. product.124, product.999 and so on), we can use the following script:

var script = "var re = /^[A-Z]{3}-[0-9]{3,4}$/;\n" +
	"if (event.value != \"\") {\n" +
	"\tevent.rc = re.test(event.value);\n" +
	"}";

for (var i=0; i<this.numFields; i++) {
    var fName = this.getNthFieldName(i);
    if (fName.indexOf("product.") == 0) {
        this.getField(fName).setAction("Validate", script);
	}
}

That’s straight forward, the only potential problem is that the script needs to be formatted so that it is a valid JavaScript string (e.g. by escaping all quotes and some other special characters, replacing line breaks with ‘\n’ and so on). But, what if we want to use not a custom validation script, but one of the built-in formatting functions or range validations that Acrobat supports. Take a look at this one:

2015 09 13 17 51 43

This is a standard numeric format option for a value with two decimal places. How can we apply that to a number of fields in one operation? Yes, we can reimplement this in JavaScript, but what if I don’t want to go through the trouble of doing that, especially because Acrobat already knows how to do that?

The good news is that behind the scenes, even these built-in functions are handled via JavaScript. We just never see the actual script because Acrobat actually parses the scripts, and if it recognizes one of the built-in functions, it does not display the custom script dialog, it just says “that’s a number with two decimals”…

So, how do we find the script that Acrobat applies in the background? More good news: The tool to do that is built right into Acrobat Pro as well (unfortunately, not into Acrobat Standard): It’s the pre-flight tool.

Let’s create a quick sample document, add one form field, and apply the formatting routine from the screenshot above. Now bring up Preflight (e.g. Tools>Print Production>Preflight in Acrobat XI and DC) and select the menu item “Browse Internal PDF Structure…” in the “Options” menu:

2015 09 13 17 52 22

This will show us the “guts” of the PDF file. For the following, we need to know that form fields are stored in the “Annoys” dictionary on the page level. The following screen shot shows the structure of the PDF file with the relevant dictionaries expanded:

2015 09 13 17 53 06

We are looking for the “Page>Annots>N>AA” dictionary entry with “N” being the annotation number. In this case – because we only have one form field in our document, this is straight forward: We are using the annotation #0. In the “AA” dictionary, we see a number of different entries. If we are dealing with a formatting command, we usually see two items: The actual format script and a keystroke script.

The “AA” dictionary entry is describes in table 220 in the PDF specification (ISO 32000-2008), which points to table 194 for an explanation of the different trigger events. For the following, we will only consider the formatting, keystroke, validation and calculation triggers. They are defined (in this order) by the keys “F”, “K”, “V” and “C”.

In this case, we see two entries for “F” and “K”. Both have to dictionary entries on their own: “S” and “JS”. The “S” key indicates what type of action is saved in this dictionary. In our case, “JavaScript” indicates that we have indeed a script, and the “JS” key contains the actual script.

We find these two scripts: The formatting script looks like this:

AFNumber_Format(2, 0, 0, 0, "", true);

And the keystroke script is this:

AFNumber_Keystroke(2, 0, 0, 0, "", true);

Both scripts call an internal function. We could now play around with the options on the formatting dialog to figure out what the different parameters mean. This old page on Planet PDF has some additional information: http://www.planetpdf.com/forumarchive/125041.asp

For what we want to do, it’s sufficient to know what the scripts actually are, without fully understanding what exactly they do: If it’s good enough for Acrobat, it’s good enough for me 🙂
– if you do that, just make sure that you do not modify anything in these scripts.

To assign these two scripts to the Format and Keystroke triggers, we can use the following few lines of code:

var f = this.getField("SomeField");
f.setAction("Format", "AFNumber_Format(2, 0, 0, 0, \"\", true);");
f.setAction("Keystroke", "AFNumber_Keystroke(2, 0, 0, 0, \"\", true);");

You can of course combine this with a look that looks for certain fields, matching a certain pattern and then apply this change only to those fields. This can be done in e.g. an Action, or a Custom Command (see my previous post about Custom Commands for more information).

We have not discussed the validation and calculation scripts that Acrobat might add. The process is the same, all we need to do is either look for the “V” key for a validation script, or the “C” key for a calculation script (e.g. a simple field notation script or one of the simple calculation methods).

This is a very simple way to automate something that otherwise requires quite a bit of clicking and pasting of information.

Posted in Acrobat, JavaScript, PDF, Tutorial | Tagged , , , , , , , | 1 Comment

Security Envelope Templates (Still) Missing in Mac Version

Adobe Acrobat has had an interesting feature since at least Acrobat 8: A “Security Envelope” allows the user to pack any document into such an “envelope” and encrypt the contents. Only the recipient with the correct “key” can decrypt these files. This is not limited to just PDF files, you can add Word, Excel, InDesign, and other files as well.

Take a look at this old tutorial from Adobe’s AcroLaw blog for more information about how to do that:
http://blogs.adobe.com/acrolaw/2007/04/safely_send_groups_of_files_usin/

In Acrobat XI and DC, you can find this feature under Tools>Protection>More Options>Create Security Envelope (for Acrobat XI, replace “More Options” with “More Protection”):

2015 09 13 15 38 08

The way this function works is as follows: You first select the files you want to “stuff” into your envelope, then you select what envelope template you want to use, and then you select the security settings you want to apply.

When you run this in the Windows version of Acrobat, you see this dialog when you are supposed to select the security envelope template:

2015 09 13 15 27 50

As you can see, there are three default templates listed. On a Mac, these templates are missing:

2015 09 13 15 29 21

This is not a new bug in Acrobat DC, these three default templates have been missing for a long time. All we can do is to browse the filesystem and point this dialog to a security envelope template that’s installed somewhere.

These missing templates are actually installed, but Acrobat does not list them. Once you know what template name to search for (e.g. because you poked around the Windows version of Acrobat), it’s not too complicated to find them. They are stored in this folder (for the English version, other localized versions have their own directory parallel to en.lproj:

/Applications/Adobe Acrobat DC/Adobe Acrobat.app/Contents/Resources/en.lproj/DocTemplates

We have two options if we want to use them:

  1. We can click on the “Browse” button and then navigate to this location in Acrobat’s application bundle. Because the files are stored in the application bundle, we need a second Finder window in which we display this folder, and then drag&drop the file we want to use into the “Envelope Templates” open dialog.
  2. Or, we can copy these template files to a directory in e.g. the user’s “Documents” folder, then click on the “Browse” button and navigate to that directory. That’s a bit more straight forward, because we only have to navigate into the application bundle once in order to copy the files to our desired target directory.

Because option #2 is so much easier to use, the following will discuss how to set up such a copy of the templates folder.

Open up a Finder window and navigate to e.g. the “Documents” folder and select to create a new folder (e.g. via File>New Folder). Rename this folder so that it uses a meaningful name (e.g. “Acrobat Security Envelopes”).

Now open a second Finder window and navigate to your Acrobat application directory (e.g. /Applications/Adobe Acrobat DC/). Here you will find a number of application bundles, one of them being “Adobe Acrobat.app”. Right click on that bundle and select “Show Package Content” from the menu:

2015 09 13 15 30 33

Now you can navigate to “/Applications/Adobe Acrobat DC/Adobe Acrobat.app/Contents/Resources/en.lproj/DocTemplates” and select the three templates and copy them to your newly created directory.

From now on, all you need to do when you want to apply a Security Envelope Template is to click on the “Browse” button and then navigate to e.g. “Documents/Acrobat Security Envelopes”.

Another tutorial on Adobe’s AcroLaw blog also demonstrates how you can create your own templates: http://blogs.adobe.com/acrolaw/2007/04/custom_security_envelopes/

Posted in Acrobat, PDF, Tutorial | Tagged , , | Leave a comment

How to Count Radio Button Choices in Acrobat JavaScript

A question that comes up every now and then on either AcrobatUsers.com or in the Acrobat JavaScript Forum is about how to count how many radio buttons with a certain value were selected by a user. Let’s for example say you have a survey that has 10 yes/no options, and you want to know how often a user selected “yes”, you would have to count the “yes” answers and then display them in a text field.

How can that be automated?

The first thing you should do is to use a naming convention for your radio button groups. In the following example, I use the following convention: All radio button group names start with “RBGroup”, that is then followed by a period and an index. This means I have “RGBroup.1”, “RBGroup.2”, and so on. This makes it easy to iterate over all groups without having to know their exact names, and how many of them there are.

The next thing you do is to make sure that all radio buttons options are the same for all the groups that you want to process. If you want to use “yes/no” options, make sure that all “yes” option are spelled the same way – “Yes”, “YES”, and “yes” are considered to be different.

Now create one or more text fields that you set to read-only (you don’t want the user to try to override your calculations) and then use the following script as the custom calculation script for each of these “count” fields:

var testFor = "Choice1";
var groups = this.getField("RBGroup");
var ar = groups.getArray();
var cnt = 0;
for (var i=0; i<ar.length; i++) {
if (ar[i].value == testFor) {
cnt++;
}
}
event.value = cnt;

The only thing left to do is to change the first line and specify what the script is testing for (e.g. “Yes”, “No”, “Choice1”, and so on.

If you don’t have a naming convention for your radio button groups, you can still use the same approach, but you would have to list your group names in an array and then process the array to use the getField() method on each array element. 

Posted in Acrobat, JavaScript, PDF, Tutorial | Tagged , , , , | 6 Comments