PDF Content Streams

How Did You Get Here?

I did some research on why and how visitors come to my site. One interesting finding is that a number of people are searching for information about PDF content streams. Here is the list of the 50 most common Google searches that contain the string “content”:

1	pdf content streams
2	content streams pdf
3	acrobat content streams
4	pdf "content streams"
5	adobe content streams
6	content streams
7	acrobat +"content streams"
8	content streams acrobat
9	content streams in pdfs
10	what are content streams in pdf
11	"content stream" pdf issues
12	"content stream"+pdf
13	"content streams" pdf
14	"content streams"+"pdf"
15	"content streams"+adobe+pdf
16	acrobat content stream
17	acrobat content stream reduction
18	acrobat professional content streams
19	acrobat reduce content stream
20	acrobat what is a content stream
21	acrobat, "content stream"
22	content stream adobe acrobat pdf
23	content stream in acrobat pdf
24	content streams adobe pdf
25	content streams in a pdf
26	content streams in adobe
27	content streams in pdf
28	content streams pdf acrobat
29	how to create content stream in pdf
30	how to reduce adobe acrobat content stream?
31	how to reduce content streams acrobat
32	move content streams pdf
33	pdf "content stream"
34	pdf "content stream" example
35	pdf "content streams" help
36	pdf acrobat what is content stream
37	pdf stream content
38	pdf what are content streams
39	pdf what is content streams
40	pdf+content+streams
41	reduce content streams in acrobat professional
42	what are content streams in pdfs
43	what are content streams pdf
44	what are pdf content streams
45	what are pdf content streams?
46	what is a content stream in acrobat
47	what is a content stream in pdf
48	what is content stream acrobat
49	what is content stream in a pdf?
50	what pdf content streams

So, I guess you want to learn more about what PDF content streams are, and how to create them, with an example or two thrown in… I think I can do that.

In a previous post and here, I’ve shown you how to look into a content stream with the tools that Acrobat has on board. The good news here is that with Acrobat 9 Professional (or Pro Extended), you do no longer have to run a preflight first, the option to browse the internal structure of the PDF is available right away.

What does the PDF spec has to say about content streams?

PDF Specification

Section 3.7 in the PDF Specification talks about content streams (and resource objects – the two travel together). Here we read that “Content streams are the primary means for describing the appearance of pages and other graphical elements.” Section 3.7.1 goes into more detail. The name “content stream” does already give away an important piece of information: We are talking about stream objects. The content stream is a stream object that describes how a page will be rendered. If your recollection of what a stream object is is a bit fuzzy, please review section “3.4.6 Object Streams” in the PDF spec again.

When we look at a page object in a PDF document, we will see a number of required entries in the page object dictionary:

  • Type
  • Parent
  • Resources
  • MediaBox

Hmmm… This list does not include the Contents entry (which does point to a content stream). Because this entry is optional, a page does not need page content, so an empty page in a PDF document does not necessarily contain a content stream. This makes it very easy to add blank pages to a PDF file.

Back to the spec: The Contents can be either a single stream or an array of streams. It is up to the creating application to decide which way to go. In general, if it’s possible to create the content stream in one operation, it’s probably best to use a single stream object, whereas a page content that contains different parts that are created either at different times, or copied from other objects or locations would suggest an array of content streams.

Contents of a Content Stream

So, what exactly is the content of a content stream? We find this information in the “Operator Summary” (appendix A in the PDF specification). This section lists all operators and a reference to where the operator is introduced in the body of the PDF specification.

I don’t want to discuss every operator (maybe in a future post – let me know if that’s something you want to see), but just fore reference purposes and so that this stuff shows up if somebody googles for one or more of these operators, here is a list:

b,B,b*,B*,BDC,BI,BMC,BT,BX,c,cm,CS,cs,d,d0,d1,Do,DP,EI,EMC,ET,EX,f,F,f*,
G,g,gs,h,I,ID,J,j,K,k,l,m,M,MP,n,q,Q,re,RG,rg,rl,s,S,SC,SCN,scn,sh,T*,Tc,Td,
TD,Tf,TJ,Tj,TL,Tm,Tr,Ts,Tw,Tz,v,w,W,W*,y,',"

Go and read up on those operators 🙂

Creating Content Streams

So, how do you create a page stream out of nothing? There are two ways: easy (that is relative!) and complicated.

Let’s take a look at the simple method first.

Using a Library or a Framework

The most simple approach to creating a content stream is to let somebody else to do the heavy lifting: If you have a PDF library or a framework that allows you to create PDF content, then you don’t have to mess with the details of what needs to be where in your content stream. Examples for libraries are of course the Adobe PDF Library, or the Acrobat API (take a look at the PDE level of API functions), PDFLib or iText. Just get familiar with the environment and create PDF content streams as complicated as you need them to be – without too much hassle.

Manually Creating Content Streams

OK, before we go any further, allow me a question: Why do you want to do this the hard way? Just stick to the approach mentioned in the last paragraph and be done with it. There really are not many reasons to torture yourself with this stuff, so just get a nice library and enjoy life…

Still here? There are only two things I can tell you at this point: Read the PDF spec, and read it again, and when you try to create your first content stream, make sure you get the stream length right. If you are dedicated to learning how to do this from scratch, there is nothing I can say that will magically make it unnecessary to read (and understand!) the PDF spec. So, get started, and if you have questions, ask them in the comments to this article. Good luck.

 

Content Streams in PDF Files

So, what does a content stream in a PDF file look like? Here is an example:

 

13 0 obj
<<
	/Length 66/Filter/FlateDecode
>> stream
[some binary data]

endstream
endobj

This stream is obviously compressed – which is indicated by the “/Filter” option of “/FlatDecode” in the stream dictionary. Let’s take a look at the uncompressed stream:

The first image shows the content stream of a page using the “View content stream with q/Q nesting levels collapsed”, and the second image uses the “View content stream by marked object”. The important difference is that the first image shows just the content stream operators, whereas the second image shows the operator without any parameters, followed by a description. To see the actual operators with parameters, the individual blocks need to be expanded.

ContentStream_1.png
ContentStream_2.png

Do you have any idea what this PDF page will look like? Here is the PDF document: test.pdf

This entry was posted in Acrobat, PDF. Bookmark the permalink.

One Response to PDF Content Streams

  1. I have a download section in my site which has PDF of approx 21 mb each. I checked the PDF and found that 95% of it is covered by content stream. Is there a way to reduce the content streams ?

    I made this PDFs from coreldraw 13. It contains vector graphics and no text.

    Please suggest.

Leave a Reply

Your email address will not be published. Required fields are marked *