Using Tesseract OCR on Mac OS X

Queries related to T-Plan Robot
Post Reply
rpes
User
User
Posts: 74
Joined: Thu Oct 29, 2009 5:43 pm
First Name: Robert
Last Name: Pes
Company: T-Plan

Using Tesseract OCR on Mac OS X

Post by rpes » Tue Oct 25, 2011 9:35 am

Solved by our support recently.

Though Tesseract OCR is not distributed in a binary form for Mac OS X, it can be compiled from the source code as follows. The steps were tested on Mac OS X 10.6.4 (Snow Leopard) with XCode 3.2.4 & XCode 3.2.6.

1. Make sure you have XCode (aka Developer Tools) installed. If you don't have them the attempt to compile the source code will crash on missing "make" or "autoconf".

2. Download the latest OCR source package file from the Downloads List
Note: The one we tested and verified was: Tesseract OCR 3.0 source code.

3. Open Terminal, and change into the source directory where you downloaded the file.
E.g.

Code: Select all

cd /users/<your_user_name>/downloads/tesseract/3.00/source
Then extract the file using Terminal with the following command:

Code: Select all

tar xvfz tesseract-3.00.tar
Finally change into the extracted directory:
E.g.

Code: Select all

cd /users/<your_user_name>/downloads/tesseract/3.00/source/tesseract-3.00
4. In Terminal, compile & install using the following sequence of commands:

Code: Select all

./configure
make
sudo make install
5. Download at least the English language data package eng.traineddata.gz, unpack it and copy the eng.traineddata file it contains to /usr/local/share/tessdata/ as follows:

Code: Select all

sudo cp eng.traineddata /usr/local/share/tessdata/
Repeat this step for any other language or languages listed at Tesseract downloads that you are going to recognize.

Be aware that at the time of writing the default English language pack titled tesseract-ocr-3.01.eng.tar.gz did not work and caused the tesseract binary to crash with the "actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55" error. A workaround is to use the eng.traineddata.gz package instead.

6. To integrate Tesseract OCR with Robot go to Edit->Preferences in Robot-s menu, select the Tesseract OCR panel and change the Tesseract Command preference to "/usr/local/bin/tesseract $1 $2 $3"

7. To test the functionality open the Script->Compareto Command window, set the Comparison Method drop down to "tocr" and click Compare to recognize the text from the currently connected desktop. If Tesseract works OK the result window will report no errors and the Comparison Result will be 100%.

8. For uninstallation of the files please navigate in Terminal back to you source directory and run:

Code: Select all

sudo make uninstall

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest