Handling embedded and non-embedded fonts in PDF & PDF/A documents

Although the first part of the PDF/A standard was published in 2005 there is still a need for some clarifications regarding fonts and embedding. What does the standard exactly require? How should PDF to PDF/A converters handle fonts? How do viewers actually deal with them and how should they?

PDF/A Creation

Let us start with the easiest case. If you create a PDF/A document then in general you have to embed all used fonts. This is true for any flavor of PDF/A such as PDF/A-1b or PDF/A-3u etc. There is only one exception to this. If the text is not visible (text rendering mode 3) then embedding is not required. Invisible text is often used to overlay a scanned page with the text from an OCR engine in order to allow for searching text in a scanned document as if it was a born digital document.

If embedding is required, however, then the font can be minimized so that it only contains those characters which are being used by the document, e.g. if a document shows the single text string "help" in Arial then the embedded Arial font program can be reduced to contain only four characters. This process is called subsetting and it is extensively used to reduce the size of the created file.

But creators must pay attention to some characters which are composed from others such as the German character "ä" which can be composed from the character "a" and "¨". This is one of the sources of bad PDFs with incomplete fonts programs.

If an embedded font is used to fill out text in form fields then the whole font must be embedded since the creator doesn't know in advance which characters are eventually selected by the user. From a technical point of view, the text remains editable if the associated font is not subsetted and vice versa. But there are also legal constraints.

The embedding and also the subsetting of  a font is subject to licensing of the font manufacturer. The majority of the licenses grant the right to freely use the font for reproduction such as viewing and printing but restricts creating and editing of text to the license owner. In any case you should carefully check the license conditions before using a font to avoid legal issues.

TrueType and OpenType fonts contain usage rights information which tells the creator software whether a font is allowed to be embedded or not. Some creators obey these flags, others don't. Whatever this information tells you it can only be regarded as a hint. In the end the written license text which comes with the purchased font is the only decisive source of information.

PDF to PDF/A Conversion

A PDF to PDF/A Converter software has to embed fonts if they aren't. For a well formed PDF input document this is not a problem. If the font is found (by name) in the installed font collection of the operating system then it is used. If it is not found it is replaced by a font which has similar characteristics as the searched font. Such fonts are often synthesized using a generic font template for serif and non-serif characters (Multiple Master Fonts) instead of installed fonts.

If the PDF input document is not well formed (e.g. if non-embedded fonts exist which are symbolic or CID fonts without a known CMAP etc.) then the converter must use similar heuristics as a viewer would use in such a situation. But since these algorithms aren't bullet proof the result might not look like as expected or the conversion may even fail.

PDF/A Viewer 

A viewer (in general a software which reads PDF files) may behave differently dependent of whether the document claims to conform to the PDF/A standard or not. If the document carries the PDF/A label then the viewer is required to use the embedded fonts whereas for a regular PDF document it may use the installed fonts instead. Using an installed font is usually faster than loading the embedded font from its compressed and possibly encrypted data stream. On the other side even if fonts have the same name they may look and behave differently.

If I still left some open questions or raised new ones please let me know and post a comment.

11 comments :

  1. Comprehensive post that make me feel it's a bit like a jungle. Formally, the creator and sender of a PDF/A should know in advance which fonts are licensed to his recipients to prevent license violations... well, who would actually check? unlike with software licence control, I don't know of any software or OS that would block a file for display because it contains fonts not licensed on the host... could it be that a PDF/A display tool would replace a non-licensed font by a local one and thus break the claim for visual integrity?

    ReplyDelete
    Replies
    1. Hi Bernard,
      Thank you for your comment. Yes, I agree. The font licensing problems are hard to keep under control. And, the font manufacturers must do their homework to clear some issues. Anyways, the responsibility to check proper licensing remains with the producer of the document not the viewer. A viewer must use the embedded font otherwise it does not conform to PDF/A.

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Hi Aeldra - Thank you for your comment. Unfortunately we had to delete it as our policy doesn't allow for advertising.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi John Son - thank you for your feedback that our blog is useful and valuable for you. Nevertheless our policy doesn't allow advertising and we've to remove your input.

      Delete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Dinesh - thank you for your positive feedback. Nevertheless our policy doesn't allow advertising and we've to remove your input.

      Delete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Our policy doesn't allow advertising and we've to remove your input.

      Delete
    2. Our policy doesn't allow advertising and we've to remove your input.

      Delete