The art of repairing damaged PDF files

I sometimes got a little shock when I wanted to open a PDF file and the viewer only showed an error message. In some cases, however, the viewer can save the file and repair it, but often not. This experience made me think about a repair tool.

There may be various reasons for a PDF file beeing corrupt or damaged. Here are the most common:
  • Systematic errors caused by bad creator or processing software
  • Unwanted modification of the PDF stream caused by storage and communication software (e.g. so called ASCII mode of Source Safe, FTP, etc.).
  • Data loss or corruption by an unreliable communication line or a disk crash
If the repair tool is aware of the nature of corruption it can carry out specific fixes:
  • Systematic errors: If the creator software is known then the repair tool can find and fix typical errors such as missing mandatory directory entries, systematically wrong object structures such as names instead of strings, badly formatted fonts, and the like. 
  • Unwanted modification: A PDF file can be created in ASCII mode. If this is the case then modifications such as inserting line breaks or replacing carriage returns by the combination carriage return / line feed is not critical. If the PDF file is created in binary mode then the stream objects can most likely not be recovered.
  • Data loss or corruption: In this case the file cannot be repaired and the information which is still valid such as scanned pages etc. can be recovered and put into a new output file.
A repair tool usually proceeds in two steps:
  1. Analysis: Check the file header and trailer, check the cross reference table, check the individual objects, check the root object, page tree and the related data structures
  2. If the analysis shows that the root object can be found and the page tree is intact then it can repair the file. If not, then it must recover as much as possible from the file and create a new one.
One of the most often found damages is an invalid cross reference table. In general this is not critical and the tool can recover it by scanning through the file and rebuild it from the found object. If there are redundant objects then the latest one is used.

Repairing the PDF object structure is quite straight forward. Some of these errors are not critical such as a missing /Type /Page entry in a page node. The tool can just add the missing entry. However, a missing /BBox [...] entry in an XObject is fatal and the tool must either try to recover it from the graphics contained in it or even remove the object.

Compressed streams, such as embedded fonts, ICC color profiles, image data etc. are difficult to recover if they have been damaged. Some compression algorithms are more robust than others. Sometimes a stream, such as an image data, can be recovered partly. The tool can then replace the missing parts with white color pixels.

Badly formatted font programs occur very often. In most cases they can be reformatted and fixed. If not they must be re-embedded from the original font. If the font is not available then it must either be replaced by a similar one or just removed.

The above cases are just a short list of examples but of course a full featured repair to can do a lot more. Please let me know what would be an important repair tool feature for you and post a comment.

22 comments :

  1. Hi Hans,

    I would wish to have following features in my PDF repair tool:

    Resolves any types of corruption errors in PDF file
    Repairs corrupt file in just 2-3 clicks
    Recovers every bit of information from PDF file
    And free to use for anyone

    ReplyDelete
    Replies
    1. Hi Raj,
      Thank you for your comment. If I were in heaven then I would wish the same. Since I'm still in the real world I must live with the deficiencies of currently available software and the insight that not every bit of information can be recovered from a PDF file.

      Delete
    2. This comment has been removed by a blog administrator.

      Delete
    3. Hi Raj,
      Thank you for your comment. Unfortunately we had to delete it as our policy doesn't allow for advertising.

      Delete
  2. Thanks for posting this article i really want to recover my damaged PDF file. But anyone want to more read about this tool visit here :: Repair Damaged PDF File

    ReplyDelete
  3. The 3-Heights™ PDF Analysis & Repair component detects and repairs corrupted PDF documents in automated processing procedures. It repairs defective or illegible PDF documents or restores them as far as possible. For further information visit our product details on http://bit.ly/1AGOXLh

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Hi Aeldra - Thank you for your comment. Unfortunately we had to delete it as our policy doesn't allow for advertising.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Friska - Thank you for your comment. Unfortunately we had to delete it as our policy doesn't allow for advertising.

      Delete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Alex - Thank you for your comment. Unfortunately we had to delete it as our policy doesn't allow for advertising.

      Delete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Alex- we had to delete your comment as our policy doesn't allow for advertising.

      Delete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Lesli - nice to read "I just wanna say thank you for sharing the content and wish you all the best for your website and your whole team." - but we had to delete your comment as our policy doesn't allow for advertising.

      Delete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Mehul,
      thanks for droping by and leaving the following comment: "You are sharing a great information about PDF Recovery. If you want an easy solution for repairing PDF file, then I suggest you can see this link." We have deleted your comment though, as it does not follow our policy concering advetising.

      Delete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Stiven,
      thank you for your comment saying: "The post mentioned above is good and quite informative for those uses who want to repair PDF file...". As our policy does not allow for external linking we had to removed your comment. Please feel free to return if you have more valuable input.

      Delete