Handling embedded and non-embedded fonts in PDF & PDF/A documents

Although the first part of the PDF/A standard was published in 2005 there is still a need for some clarifications regarding fonts and embedding. What does the standard exactly require? How should PDF to PDF/A converters handle fonts? How do viewers actually deal with them and how should they?

PDF validation with customer specific extensions

While talking about PDF validation workflows I often come across questions like "Can I let the validation fail if the paper format does not match our corporate rules?". This and other customer specific requirements are indeed useful extensions to the pure file format and standard conformance tests.

Scan to PDF/A - some insights

Traditionally a scanner produces a TIFF or JPEG image for each page. Some of them can directly produce PDF files. And newer devices produce files conforming to the PDF/A standard. However, the quality of the produced files differ significantly. Why is this and why is it worth to use a central scan server?

Are the PDF/A space requirements a show stopper for archiving?

A PDF/A document requires that all resources such as fonts, color profiles, etc. must be embedded in the file. The archiving of transactional documents can be nightmare because such documents are usually short by nature and contain huge number of copies of the same Frutiger font, sRGB color profile and company logo. Many archives therefore prefer TIFF over PDF/A when it comes to born-digital documents. But that is certainly not the idea of a uniform standard. How can this problem be solved?

What can I do about sliced images?

If I try to extract images from a PDF file it sometimes happens that I get a bunch of slices of the original image, mostly consisting of a few image rows per slice or, in extreme cases, just one row. Why is that and how can I get the entire image in one piece?

Automating the conversion of Microsoft Office Documents to PDF/A

A central service to convert Microsoft Office documents to PDF or PDF/A has obvious advantages. The conversion is done on an enterprise wide platform with well defined software versions and conversion process configurations. This guarantees a consistent quality and makes the deployment and operation of client based software obsolete. The price for this, however, is that the central service must automate the native applications, such as Microsoft Word, which are designed for interactive use not for server operation.

Can I trust PDF validation software?

If I use validation software from different manufacturers I sometimes get different results. Why can this happen? Does it mean that I can't trust the software? What can I do about it? I hear these and more questions very often and I can understand the user's concerns. In this article I try to shed some light on the mysteries of PDF validation.

The art of repairing damaged PDF files

I sometimes got a little shock when I wanted to open a PDF file and the viewer only showed an error message. In some cases, however, the viewer can save the file and repair it, but often not. This experience made me think about a repair tool.

Clear up some myths about PDF printer drivers

I'm sure you've heard sometimes phrases like "a printer driver which is based Windows GDI can only reproduce RGB colors" or "if a printer driver isn't built on top of the PostScript driver one cannot print EPS graphics". There is no question that these myths persist since PDF Producer software exists. In this article I'd like to give you some background information which helps you to understand how it really works.

The supreme discipline of converting PDF to PDF/A

We all know that the conversion one file format to another is not as easy as one might wish for and can lead to unpleasant surprises. However, it is hardly known that this is the case for the conversion from PDF to PDF/A. Why is that?

How to transform spot colors without ambiguities

On an RGB screen or a CMYK laser printer spot colors cannot be displayed directly and must be emulated by converting them into their process color equivalent. The Separation and DeviceN color spaces provide tint transform functions to do so. However, with NChannel colors spaces there exist separate transforms for the individual components and for the collection of the components. Which one to take?

How to preview overprinting

Some printing machines can print colors on top of others and some cannot. Some PDF viewers offer an optional preview function that simulates the effect of overprinting on a display screen. Since this function is not specified and therefore implemented differently or not at all, it causes some confusion among users.

Vertical writing - Not just a matter of fonts

In some writing systems you can place the characters vertically from the top to the bottom to form you sentences. Most users know this from Chinese, Japanese and Korean texts. What does it take to use the vertical writing mode in the PDF? And, what does it mean for a viewer to display the characters of vertical text correctly?

Font conversion - what's this good for?

The native font format of Microsoft Windows is TrueType. The PostScript fonts are not very well supported on Windows and the Type 1 format has even been discontinued. On the other side some embedded computers of printing machines have troubles with TrueType fonts and prefer PostScript fonts. One approach to circumvent these problems is to convert the font to the desired format but this can be a challenge of its own.

What is PDF optimization anyway?

Most people agree that PDF optimization has something to do with reducing file size. Sometimes it's about faster rendering. Anyway, optimization is a wide topic and it's certainly worth a closer look.

Splitting and merging pages of PDF documents

Single out pages from a number of input documents and re-arrange them in a set of output documents belongs to the daily routine in a document assembly application. At first glance, this seems to be a clear and understandable task. But PDF offers some special features, on which you should keep an eye during assembly.

The PDF Forms Babylon - AcroForms and XFA

PDF forms are very popular among users. A tools programmer can, however, choose between two different forms systems: AcroForm and XFA. When to choose which and why?

How to avoid implementation dependent appearances of annotations and interactive form fields

Annotations such as notes and links belong to the most appreciated PDF features. Although they are easy to use from a user's perspective this is certainly not true from a developers perspective since it is often not clear how to render them.

Is the PDF version information a neglectable detail?

Each PDF file starts with a header comment that carries version numbers. What do they mean? Can a reader ignore these numbers or what is it to do? Is there any other version information that a reader should take care of?

Can linearization be combined with digitial signatures and PDF/A conformance?

Linearization is a feature to optimize PDF files for sequential reading. Although it is very useful in web based applications it interferes with other features such as digital signatures.

Converting invoice documents to the ZUGFeRD data format

XML or PDF? That's one of the most heard questions when it comes to invoice document formats. XML is the preferred format for machines whereas PDF is the format for humans. But why not have the cake and eat it? With ZUGFeRD you can have both documents in one file.

Why is the extraction of text from a PDF document such a hassle

When I use a text editing tool such as Microsoft Word then it is quite natural that I can select a portion of text and copy it to the clipboard and paste it in to a window of any other tool. Not so with PDF. At least not with any kind of document. Why is that?

A good PDF printer tool can be amazingly versatile

As the name suggests a PDF printer tool is meant to put the contents of a PDF document onto paper. However, it is surprising how many applications can be built when using a printer tool in collaboration with the Windows Printing Subsystem.