Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.
The traditional technical environment for a digital signature is the public key infrastructure (PKI). Digital signatures are also used to implement electronic money such as Bitcoin. However, Bitcoin uses a new technology, the blockchain. This new technical infrastructure can also be employed to sign documents. But what are the benefits?
Every developer of a PDF viewer, a PDF printer and a PDF to Image Converter tool comes across the requirement to render non embedded fonts and is facing quite a challenging task. Not only developers but also users of these tools might be interested in non embedded fonts and how they are treated by these tools.
I often hear that the inline image construct is a major flaw in the design of the PDF page description language. Inline images are an often used feature in Type 3 fonts. However, the stomach pain of some experts even caused them to adjust this feature in the upcoming PDF 2.0 standard. What are inline images and why do some programmers of PDF readers feel uncomfortable about them?
I often get the question whether it is possible to convert digitally signed documents to PDF/A. Because there's no short answer to this I thought it would be helpful to explore the topic a bit into more detail.
When it comes to printing then all colors in a PDF document are transformed to the native color space of the printing device. If, e.g. a text uses a black RGB color then it is transformed to an equivalent CMYK value which contains contributions from all four color channels. In particular in mass printing applications these "rich black" values are not wanted, however, and it is required to use "true black" colors which use the K channel only. This article gives some ideas how this transform can be achieved.
PDF is more and more finding its way into mass printing applications. However, PDF spool files often ask too much from a print engine resulting in aborts or, even worse, incomplete prints which may not be noticed. What is special about PDF mass printing and what can be done about it?
JBIG2 is a compression algorithm for bitonal images and has been developed to replace the widely used CCITT G4 algorithm because it can reach better compression ratios. However, the algorithm has received a bad reputation which has led some security experts to the recommendation not to use the algorithm anymore. Is this a wise advice or just an overreaction? Why could it go so far?
Detecting pictures in scanned document pages has many advantages such as better compression rates and the possibility to extract them individually.
In order to reduce the file size PDF producers use a technique called font subsetting. What does exactly happen with the fonts and what are the consequences?