What is PDF optimization anyway?

Most people agree that PDF optimization has something to do with reducing file size. Sometimes it's about faster rendering. Anyway, optimization is a wide topic and it's certainly worth a closer look.

Optimization has been an important subject since the beginning. The first three editions of the famous PDF Reference Manual devoted not less than four chapters on this topic. You can find very useful tips in them, by the way, which I can highly recommend to all manufacturers of PDF software. But there are, aside from optimal file creation, other document processing functions, such as merging, print preparation etc. where optimization comes in.

Instead of going into too much detail, let me list the most important optimization areas:

  • Unused objects, that have been replaced by newer versions or aren't used at all or describe a default value etc. may safely be removed.
  • Redundant objects, which have an identical structure may be merged into one instance. However, this only works for specific object types such as resources, content streams etc. But this doesn't apply to objects whose instance represents an implicit attribute such as page objects, tree elements etc.
  • Unwanted objects, which are not needed for the intended purpose of the file can be discarded. For example a file to be printed doesn't need article threads, web capture information and the like.
  • Images can be reduced in size by reducing their resolution to the target devices resolution or by using stronger compression algorithms. Especially with scanned images techniques such as mixed raster content (MRC) may reduce the file size significantly. I will post a separate article about this later.
  • Embedded fonts can be optimized by building glyph subsets or by merging different instances of the same font program into one. Some print workflows require the embedded font program to be removed and replaced by the installed font.
  • Transparency is often a challenge for printer devices and a must before the document is converted to a printer language such as PostScript or PCL. Transparency flattening, that is the process of replacing transparent objects by opaque ones, involves rasterization in device resolution in most cases. Flattening increases the file size in most cases.
  • Compression is a generic means to reduce the size of streams. Since version 1.5 objects and cross reference information can be put in compressed streams to save space. Suitable are stream-less objects which are not used for rendering such as outlines, document structure etc.
  • Linearization adds information for fast web viewing and thus does not reduce the file size. I will post an article on this topic later.
In addition to the above some tools offer some more sophisticated but somewhat risky functions, such as:
  • Color Conversion of text, paths and images is not as simple as it sounds. In particular if transparency or overprinting is involved then color conversion can become a nightmare.
  • Invisible objects which are fully covered by other opaque objects or clipped can be removed which is not desired if the document needs to be edited later.
  • Operator coalescence is used to reduce the number of operators used and speed up rendering.

The list of optimization functions is far from complete. Please let me know which missing optimization function is important to your application and post a comment.