1. 安装依赖的leptonica库
建议使用 su root
切换到root用户下安装,避免编译过程中的权限不足问题
wget http://www.leptonica.org/source/leptonica-1.78.0.tar.gz
tar -xzvf leptonica-1.78.0.tar.gz
cd leptonica-1.78.0
./configure
make && make install
2. 安装Tesseract-OCR
同样建议使用 root 用户编译
wget https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.0
tar -xvf 4.1.0
cd tesseract-4.1.0/
./autogen.sh
./configure
make && make install
sudo ldconfig
安装过程比较简单,根据机器配置与网络情况,可能需要30-60分钟
3. 可能的报错
- 执行 ./autogen.sh 报错
./autogen.sh:行59: bail_out: 未找到命令
./autogen.sh:行82: aclocal: 未找到命令
解决方案
yum install automake -y
yum install libtool -y
- tesseract make 时报错
libtool: Version mismatch error. This is libtool 2.4.6, but the
libtool: definition of this LT_INIT comes from libtool 2.4.2.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.6
libtool: and run autoconf again.
解决方案
执行 autoreconf -ivf 命令
- 安装完成后执行命令报错
$ tesseract 13.jpg result -l chi_sim
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
解决方案:
1. 下载预训练文件
2. 将训练文件放至 /usr/local/share/tessdata 目录
下载地址:https://github.com/tesseract-ocr/tessdata
chi_sim.traineddata 中文
eng.traineddata 英文
enm.traineddata 数字