The caveats of assembling PDF/A documents

Assembling PDF documents from various sources is a crucial part of an output management system. And, as the document needs to be archived in most cases, it should conform to the PDF/A standard. But, is there a way to assemble a document and accomplish PDF/A conformance in one step?

An assembled document may be a raw transaction document originating from an enterprise resource planning system. Usually it is embellished with corporate identity elements and complemented with some white space advertising before it is sent to the customer. Or, it might be a very complex FDA documentation for a drug development and approval process containing thousands of pages of lab reports, clinical studies and the like.

Whatever the purpose of an assembled document might be, the common challenge is to create a document or a set of documents with a consistent appearance of all its parts. It should look as if it was created by a single application. In order to achieve this, most output management systems use document assembly toolboxes which typically offer the following functions:
  • Merge documents from multiple sources
  • Insert empty pages (for duplex printing)
  • Insert pages which are created on-the-fly (table of contents, etc.)
  • Delete unnecessary pages
  • Sort pages in any order (booklet, reverse order, etc.)
  • Rotate pages (portrait, landscape)
  • Scale a page (shrink from A3 to A4, convert from Letter to A4, etc.)
  • Crop a page (make register and crop marks invisible)
  • Add page overlays and underlays (corporate identity)
  • Arrange multiple pages on one sheet (2-up, 4-up, 6-up etc.)
  • Add content to a page such as OMR marks, bar code, pagination, watermarks etc.
  • Add XMP metadata to the document
  • Set the document's output intent color profile
  • Remove unnecessary features such as named destinations, tagging etc.
Furthermore, since the assembled document is sent to a customer or another business partner it has to be archived and thus conform to the PDF/A standard.

In general, there are two ways to make an assembled document conform to PDF/A: 
  • Assemble the documents while disregarding their PDF/A conformance in a first step and then convert the result into a PDF/A document in a second step.
  • Assemble the document from sources that already conform to PDF/A, create new conforming content and process all parts such that PDF/A conformance is maintained.
The first might be easier to implement since it imposes less requirements on the quality of the source documents. In high performance and high volume applications, however, the second approach might be the only feasible solution.

One of the main challenges to be mastered is to consolidate the output intents. Each input file can have a different output intent. So all color spaces must be checked and adapted to reflect the new output intent before they can be used in the output document. There are many other challenges such as handling fonts etc. But I will touch all of these topics in detail in separate articles.

We have designed a component which offers most of the above features and some more. In addition, the 3-Heights™ PDF Toolbox is capable of creating PDF/A conforming output documents assembled from multiple sources and content generated on-the-fly. 

I hope this article is useful. As usual I would appreciate your feedback and get your comments.