home Get a blog for free contact login

Pages tagged: ocr

Open source OCR sucks

This night I've tried text recognition with various open source tools. The input were images packages as PDF. The text in the images was bad looking, but readable.

To summarize my experience:

  • Too much reading
  • Too much hassle to convert between various input formats accepted by the tools
  • Totally unacceptable results
  • Even segmentation faults by some of the apps

None of the tools did the job even close to what I expected. Maybe it was my fault, but I could not spend a day each time I need to do a simple job which I do not do each month.

At the end I did the job by googling for "Online OCR" and using (guess what ?!) http://www.onlineocr.net/ for the first five pages. It had a limit for five pages per hour for non registered users (and 5 pages total for registered ones) so I registered and OCRed the last sixth page.

BTW, just to prove my point of not enough reading I later found this site http://www.free-ocr.com/, which also did the job and used one of the software I have tried - Tesseract.

Posted in dir: /blog/
Tags: linux ocr oss

All tags SiteMap Owner Cookies policy [Atom Feed]