Connect to Database from PDF Form – This Time Without SOAP

I wrote about how to get data from (or to) a database from a PDF form using SOAP a while ago. Using SOAP poses a problem when you want to make such a solution work with the free Adobe Reader. In the past, Adobe had an ODBC interface built into the Windows version of Acrobat/Reader (named the ADBC interface), but that had the same problem as far as Reader goes, and was removed back in the days of Acrobat 9. So what can be done to connect a PDF form to a database in a way that also works with the free Reader? Be prepared for a long post that is of the most part about PHP running on a web server. You will need a web server that supports PHP if you want to follow along. I assume you know how to install PHP scripts on your web server, and also how to create PDF forms that submit data to a server.

The solution is to “talk” back and forth between the PDF form and the web server using XFDF. XFDF is the XML version of FDF, the “Form Data Format”, which is based on the PDF format (it’s a stripped down version of PDF). The FDF format can be used to submit form data from a PDF form to a web server, and to receive information back from the server. Reading and writing FDF is a complex task, and Adobe used to have the FDF Toolkit, which helped with these tasks, but this toolkit has not been updated since Acrobat 7 and is not supported by Adobe anymore. The XFDF format can do almost anything that can be done with FDF, but in a much easier to parse and to write format.

To take a look at what FDF and XFDF files look like, it’s easy to create them by exporting data from a PDF form using Adobe Acrobat: In Acrobat DC, load a form and then search for “export” in the tools search bar and select “Export data from a form file” in the “Prepare Form” cateogry. You can then select the output format on the “Save” dialog (use either FDF or XFDF). Here is a sample FDF file (slightly reformatted to make it easier to read):

The corresponding XFDF file looks like this (again, slightly reformatted for human consumption):

For anybody with at least some XML background, it’s obvious that the XFDF file is much easier to understand and to parse. The FDF format is described here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/fdf_data_exchange.pdf, and the XFDF format is described in this document: https://www.yumpu.com/en/document/view/32927291/xml-forms-data-format-xfdf-specification-adobe-partners (I don’t have a link to the original Adobe hosted document, I also don’t know how reliable or trustworthy this service is).

Let’s take a look at what the XFDF document above contains: There is one top level XML node (as required by the XML standard) called “xfdf”, which contains three nodes: f, fields and ids – for now we can ignore the f node (which is just a reference to the PDF file this XFDF data came from – should be imported into), and the ids node (that’s the document ID). The interesting “stuff” is happening in “fields”: It contains a list of field nodes – each one describing the data stored in a specific field. Each field node has a “name” attribute, and contains a value node with the actual data. In the example above, we can see that there is one field in the document named “Field 1”, which contains the string “test data”. Pretty simple.

As mentioned before, this file was exported from a form, so we can see what data was actually entered in the form, but the same approach can also be used to import data into the form. I can for example change the value node to now contain the string “new data”. When I now use the import function in Acrobat, I can fill my form with this updated data.

What we’ve done so far by manually importing and exporting can be automated. To export data we can for example use the “submit a form” action on a button – or the Doc.submitForm() JavaScript method. Both methods allow us to specify the format we want to submit our data in. We are looking at XFDF, so let’s select XFDF as the form’s submission format.

Data is usually submitted to a web server (it can also be emailed, but for automation purposes, that just complicates things). Before we can actually submit the data from a form, we need to take a look at how we can receive that data on the server. The following example will use PHP on an Apache server, but you should be able to adapt the solution to any other server setup.

The most simplistic PHP script that can accept data but without actually processing the data (it’s only stored on a variable that we never use again) is this:

When we now submit our form data to this PHP script on a web server (I assume you know how to setup PHP scripts on your server), and we click on the submit button in Adobe Acrobat, we get a new PDF file that reads “Received some data.”. This looks like our data made it to the server – but we don’t yet have a way to get information back.

When we make things a little bit more complex, we can actually see the submitted XFDF:

What are we doing here? The data that is sent back (which is just text) gets interpreted by Acrobat as HTML, this means that it will filter out all the XML. In the additional line of PHP code we just added, the “<" gets replaced with it's entity string, and that will make all the XML visible. After submitting the form data again, we see the following in Acrobat: Screenshot of PDF document in Adobe Acrobat that shows XML code.

We actually see two form fields in this XFDF data structure: The submission button is reflected as well.

Our problem now is that Acrobat is creating a new PDF file with the contents of the data we are sending back. Before we find a way to avoid that, let’s take a look at how Reader handles this scenario:

Acrobat can convert from HTML (or text) to PDF, and that is what is happening here: Acrobat receives a message back from the web server and it converts that to PDF and opens that new PDF as a new document. The free Adobe Reader cannot convert from HTML or text to PDF, so when you try this in Reader, you will end up with an error message.

How can we get data back into our form? The key here is that Acrobat will expect an FDF or XFDF data structure that is returned after a form submission if we append “#FDF” or “#XFDF” to the URL. So, if we so far just used “http://localhost/XFDFTesting/submit.php”, we need to adjust that URL to read “http://localhost/XFDFTesting/submit.php#XFDF”

The good news here is that once we force the server’s reply back into the same document using this mechanism, Reader will be happy too, and that means we can create a connection between a PDF form opened in Reader and a database (as long we we provide this PHP “glue” in between the two), and that without having to apply special rights to the PDF document. This is not something that was possible with the old ADBC mechanism, nor is it possible when using the current SOAP implementation.

But, to do that, we need to reply with valid XFDF data. Let’s try this with a hardcoded response:

Besides the hardcoded string, we are also setting the content type of the reply to “application/vnd.adobe.xfdf”, which tells either the browser, or Acrobat (or the free Adobe Reader) that the reply contains XFDF data.

When we now submit our form, the server will reply with an updated value for “Field 1”, and Acrobat in turn will update that field’s value.

Security

If you followed along so far, you’ve probably noticed that Acrobat will not just allow you to submit data to a web site, it will prompt the user to select if the operation should be allowed once, or forever, and will then force the user to actually click on the submission button again to actually send the data to the server. When you create a solution based on this technology, please inform your users about what to expect when clicking the submission button for the first time, and how to proceed.

Reading and Setting Data

At this point, we have almost all the parts that we need to create a solution that reads and processes information submitted by the form, and to reply with a data record that Acrobat (or Reader) can use to populate fields in a form. To make things comparable to the SOAP implementation I presented earlier, I will again create and retrieve a unique number from a database so that e.g. a form can be labeled with that unique number. You may want to “>visit my earlier post for some background information.

What is still missing is only plain PHP – nothing specific to Acrobat or the PDF environment: We need to parse the XFDF data we receive from Acrobat (remember, this is just XML, so anything that can parse XML will do), and we need to create a response in valid XFDF with potentially updated information.

Here is a PHP script that adds these two missing features:

This is quite a bit more complex than the simple sample scripts we’ve used so far. There are two key sections in this PHP script:

When the original form submission is processed, we need to parse the XFDF data and extract the information we need to create a unique number. This is done by creating a new DOMDocument ($myXFDF) and initializing that with the submitted form data. Then, we use XPath constructs to retrieve the actual data. You can read up on how to use the DOMDocument object here: http://php.net/manual/en/class.domdocument.php

Once we have the updated data, it needs to be wrapped in XFDF again, and this is done in the function createXFDF() – we pass an array that maps field names to the fields’ values into this function. There are two optional arguments to this function, which we will ignore for now. There will be a future post about how to use the $file parameter.

I am not making any files available for this little project – they would depend on the actual implementation of your solution (e.g. where on your server the scripts are stored) – but it should be fairly straight forward for anybody with PHP experience to create such a solution and deploy it. If you need help with your implementation, I do provide this service as part of my consulting business, so feel free to get in touch with me via email.

This entry was posted in Acrobat, PDF, PHP, Tutorial, Web Server and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *