12 Jun 2007

Scanning To PDF

Posted by khk

Rick Borstein, who writes a blog about “Acrobat for Legal Professionals” has two new articles – one about Troubleshooting Acrobat OCR, and one with more information about changes in Acrobat 8, specifically about  the Fix for Renderable Text Issue, something I had not heard about before.

The first article has very good information about how to create scans that can be OCR’ed with Acrobat without any problems.

So, how am I scanning documents to the PDF format? I’m not using Acrobat! I’ve had too many problems with Acrobat 6 and 7 on my Mac. It also took quite some time until I was actually able to scan with Acrobat 8, so my confidence level is not the best. Previous versions of Acrobat ruined my scan jobs too often by crashing on me scanning 100 or 200 pages.

I never had any problems when I used the Mac’s Image Capture application. This software can create TIFF files that can be imported into Acrobat with the “Create PDF From Multiple Files” function. Once the images are in Acrobat, you can use Acrobat’s OCR function to make the files searchable. There is one small problem with that approach: Image Capture names the files with a base filename, followed by the page number. The page number starts with 1, and ends with the largest page number (e.g. 100). Because the single digit page numbers don’t have any leading zeros, the page order in Acrobat would require some serious page reordering. I have a small Perl script that takes care of that.

For most documents Acrobat’s OCR is good enough, but I am considering buying Abbyy’s FineReader, which unfortunately is only available for Windows. I would run it under Parallels Desktop on my Mac. In my opinion, FineReader’s OCR does the best job of any consumer class package I’ve tested. I’m just waiting for a demanding scan job…

Leave a Reply