How Did You Get Here?
I did some research on why and how visitors come to my site. One interesting finding is that a number of people are searching for information about PDF content streams. Here is the list of the 50 most common Google searches that contain the string “content”:
1 pdf content streams 2 content streams pdf 3 acrobat content streams 4 pdf "content streams" 5 adobe content streams 6 content streams 7 acrobat +"content streams" 8 content streams acrobat 9 content streams in pdfs 10 what are content streams in pdf 11 "content stream" pdf issues 12 "content stream"+pdf 13 "content streams" pdf 14 "content streams"+"pdf" 15 "content streams"+adobe+pdf 16 acrobat content stream 17 acrobat content stream reduction 18 acrobat professional content streams 19 acrobat reduce content stream 20 acrobat what is a content stream 21 acrobat, "content stream" 22 content stream adobe acrobat pdf 23 content stream in acrobat pdf 24 content streams adobe pdf 25 content streams in a pdf 26 content streams in adobe 27 content streams in pdf 28 content streams pdf acrobat 29 how to create content stream in pdf 30 how to reduce adobe acrobat content stream? 31 how to reduce content streams acrobat 32 move content streams pdf 33 pdf "content stream" 34 pdf "content stream" example 35 pdf "content streams" help 36 pdf acrobat what is content stream 37 pdf stream content 38 pdf what are content streams 39 pdf what is content streams 40 pdf+content+streams 41 reduce content streams in acrobat professional 42 what are content streams in pdfs 43 what are content streams pdf 44 what are pdf content streams 45 what are pdf content streams? 46 what is a content stream in acrobat 47 what is a content stream in pdf 48 what is content stream acrobat 49 what is content stream in a pdf? 50 what pdf content streams
So, I guess you want to learn more about what PDF content streams are, and how to create them, with an example or two thrown in… I think I can do that.
In a previous post and here, I’ve shown you how to look into a content stream with the tools that Acrobat has on board. The good news here is that with Acrobat 9 Professional (or Pro Extended), you do no longer have to run a preflight first, the option to browse the internal structure of the PDF is available right away.
What does the PDF spec has to say about content streams?
Section 3.7 in the PDF Specification talks about content streams (and resource objects – the two travel together). Here we read that “Content streams are the primary means for describing the appearance of pages and other graphical elements.” Section 3.7.1 goes into more detail. The name “content stream” does already give away an important piece of information: We are talking about stream objects. The content stream is a stream object that describes how a page will be rendered. If your recollection of what a stream object is is a bit fuzzy, please review section “3.4.6 Object Streams” in the PDF spec again.
When we look at a page object in a PDF document, we will see a number of required entries in the page object dictionary:
Hmmm… This list does not include the Contents entry (which does point to a content stream). Because this entry is optional, a page does not need page content, so an empty page in a PDF document does not necessarily contain a content stream. This makes it very easy to add blank pages to a PDF file.
Back to the spec: The Contents can be either a single stream or an array of streams. It is up to the creating application to decide which way to go. In general, if it’s possible to create the content stream in one operation, it’s probably best to use a single stream object, whereas a page content that contains different parts that are created either at different times, or copied from other objects or locations would suggest an array of content streams.
Contents of a Content Stream
So, what exactly is the content of a content stream? We find this information in the “Operator Summary” (appendix A in the PDF specification). This section lists all operators and a reference to where the operator is introduced in the body of the PDF specification.
I don’t want to discuss every operator (maybe in a future post – let me know if that’s something you want to see), but just fore reference purposes and so that this stuff shows up if somebody googles for one or more of these operators, here is a list:
b,B,b*,B*,BDC,BI,BMC,BT,BX,c,cm,CS,cs,d,d0,d1,Do,DP,EI,EMC,ET,EX,f,F,f*, G,g,gs,h,I,ID,J,j,K,k,l,m,M,MP,n,q,Q,re,RG,rg,rl,s,S,SC,SCN,scn,sh,T*,Tc,Td, TD,Tf,TJ,Tj,TL,Tm,Tr,Ts,Tw,Tz,v,w,W,W*,y,',"
Go and read up on those operators 🙂
Creating Content Streams
So, how do you create a page stream out of nothing? There are two ways: easy (that is relative!) and complicated.
Let’s take a look at the simple method first.
Using a Library or a Framework
The most simple approach to creating a content stream is to let somebody else to do the heavy lifting: If you have a PDF library or a framework that allows you to create PDF content, then you don’t have to mess with the details of what needs to be where in your content stream. Examples for libraries are of course the Adobe PDF Library, or the Acrobat API (take a look at the PDE level of API functions), PDFLib or iText. Just get familiar with the environment and create PDF content streams as complicated as you need them to be – without too much hassle.
Manually Creating Content Streams
OK, before we go any further, allow me a question: Why do you want to do this the hard way? Just stick to the approach mentioned in the last paragraph and be done with it. There really are not many reasons to torture yourself with this stuff, so just get a nice library and enjoy life…
Still here? There are only two things I can tell you at this point: Read the PDF spec, and read it again, and when you try to create your first content stream, make sure you get the stream length right. If you are dedicated to learning how to do this from scratch, there is nothing I can say that will magically make it unnecessary to read (and understand!) the PDF spec. So, get started, and if you have questions, ask them in the comments to this article. Good luck.
Content Streams in PDF Files
So, what does a content stream in a PDF file look like? Here is an example:
13 0 obj << /Length 66/Filter/FlateDecode >> stream [some binary data] endstream endobj
This stream is obviously compressed – which is indicated by the “/Filter” option of “/FlatDecode” in the stream dictionary. Let’s take a look at the uncompressed stream:
The first image shows the content stream of a page using the “View content stream with q/Q nesting levels collapsed”, and the second image uses the “View content stream by marked object”. The important difference is that the first image shows just the content stream operators, whereas the second image shows the operator without any parameters, followed by a description. To see the actual operators with parameters, the individual blocks need to be expanded.
Do you have any idea what this PDF page will look like? Here is the PDF document: test.pdf