Does OCR make sense for digitally generated PDFs?

Scanned PDF files usually consist of one raster image for each page. The OCR engine can recognize the text in this image and make the document searchable. But what about digitally generated documents?

Using native applications in a PDF document conversion service

Automated conversion of Office documents into PDFs has become a popular service. When designing the architecture of such a service, the question arises as to whether the native application or a specially developed software library should carry out the conversion. The pros and cons are not obvious, so it's worth taking a closer look.

Importing images into a PDF file - a seemingly trivial task

A picture is worth a thousand words. That's why they are fondly embedded in PDF files. One would expect that embedding images in a PDF file is a simple task. Because it seems so easy, there are also many, including free, tools for it. But do these tools do what you expect them to do?

A closer look reveals that embedding images is anything but trivial.

PDF 2.0 - A quick overview

It is rare for industrial products to survive for more than 20 years – especially in the IT industry. Not even the inventors of the PDF could have imagined just how successful their file format would be when they launched the first version of Acrobat in June 1993. The members of the International Organization for Standardization (ISO) have been working on the next generation of this popular format.

How to deal with poor PDF quality

"Quality is remembered long after the price is forgotten", says a Gucci family slogan. Nevertheless, creators of PDF documents, private users up to large companies, regularly produce files with insufficient quality causing unexpected problems and cost in document processing steps. So, companies are forced to equip their document 'inbox' with a quality control system.