7 Jun 2007

Working With PDF Page Content Streams

Posted by khk

[ Due to a huge amount of comment spam, comments on this post are now disabled ]

As I mentioned last time, the “Browse Internal PDF Structure” feature, that Adobe added to Acrobat Professional 8.0 can reveal some interesting information when it’s applied to a page content stream.

I assume that anybody interested in learning about the “guts” of a PDF document is familiar with the “PDF Reference”. The document can be downloaded from Adobe’s Developer Center.

Let’s take another look at the window that pops up when we select the “Browse Internal PDF Structure” functionality:

Screenshot 1

The tool bar contains the following items (from left to right):

  • Browse internal document structure
  • Browse internal structure by page
  • Find
  • Find Previous
  • Find Next

If “Browse internal document structure” is selected, that’s the end of the toolbar, for “Browse internal structure by page” there are a few more buttons (puzzle pieces):

Screenshot 2

  • View content stream by marked object
  • View content stream with q/Q nesting levels collapsed
  • View content stream with marked content blocks collapsed
  • View content stream with text blocks collapsed
  • View content stream with each marking object collapsed

So, what does this mean? Here are some screenshots for displaying the internal structure by page.

Screenshot 3

With the help of he PDF Reference, we can decipher what these elements in the page level Cos dictionary are:

  • Cropbox
  • Annots
  • Parent
  • StructParents
  • Contents
  • Rotate
  • MediaBox
  • Resources
  • Type

All these elements are described in section 3.6 Document Structure of the PDF Reference. The most important ones are the CropBox, which defines which part of the page will be visible in a PDF Viewer, or will be rendered by a PDF or a Postscript printer, the Contents element, which contains the page content stream and the Resources dictionary, which contains the page resource elements like fonts, images, color spaces, …

So far we have not seen any difference in how the data is represented, based on which toolbutton was selected. All puzzle buttons contain the term “content stream”, so let’s take a look at the page content stream by expanding the “Contents” tree item, and underneath it the “Content stream” element. The next screen shots show the different views:

View content stream by marked object
Screenshot 4
View content stream with q/Q nesting levels collapsed
Screenshot 5
View content stream with marked content blocks collapsed
Screenshot 6

View content stream with text blocks collapsed
Screenshot 7

View content stream with each marking object collapsed
Screenshot 8

So, what’s the difference between these different views?

Depending on what you are looking for, one or the other will make it easier to either find information, or to keep track of what’s going on when a page is rendered. More about this later…

BTW: Since I first played with this feature, Adobe has made some documentation available online. The related item in the help system does point to that link.

Subscribe to Comments

One Response to “Working With PDF Page Content Streams”

  1. […] a previous post and here, I’ve shown you how to look into a content stream with the tools that Acrobat has on board. […]