Digging for information by extracting data from a PDF document

Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.

Using blockchains as an alternative to PKIs for digital signatures

The traditional technical environment for a digital signature is the public key infrastructure (PKI). Digital signatures are also used to implement electronic money such as Bitcoin. However, Bitcoin uses a new technology, the blockchain. This new technical infrastructure can also be employed to sign documents. But what are the benefits?

How to render the text of a PDF document if the font is not embedded?

Every developer of a PDF viewer, a PDF printer and a PDF to Image Converter tool comes across the requirement to render non embedded fonts and is facing quite a challenging task. Not only developers but also users of these tools might be interested in non embedded fonts and how they are treated by these tools.

Inline images and Type 3 fonts

I often hear that the inline image construct is a major flaw in the design of the PDF page description language. Inline images are an often used feature in Type 3 fonts. However, the stomach pain of some experts even caused them to adjust this feature in the upcoming PDF 2.0 standard.  What are inline images and why do some programmers of PDF readers feel uncomfortable about them?

How to convert signed documents to PDF/A?

I often get the question whether it is possible to convert digitally signed documents to PDF/A. Because there's no short answer to this I thought it would be helpful to explore the topic a bit into more detail.

Replacing rich black by true black in PDF documents

When it comes to printing then all colors in a PDF document are transformed to the native color space of the printing device. If, e.g. a text uses a black RGB color then it is transformed to an equivalent CMYK value which contains contributions from all four color channels. In particular in mass printing applications these "rich black" values are not wanted, however, and it is required to use "true black" colors which use the K channel only. This article gives some ideas how this transform can be achieved.

The problem with embedded fonts in PDF mass printing applications

PDF is more and more finding its way into mass printing applications. However, PDF spool files often ask too much from a print engine resulting in aborts or, even worse, incomplete prints which may not be noticed. What is special about PDF mass printing and what can be done about it?

Is JBIG2 soon banned?

JBIG2 is a compression algorithm for bitonal images and has been developed to replace the widely used CCITT G4 algorithm because it can reach better compression ratios. However, the algorithm has received a bad reputation which has led some security experts to the recommendation not to use the algorithm anymore. Is this a wise advice or just an overreaction? Why could it go so far?