第一步:依赖安装
yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel
第二步:下载安装Leptonica
下载:wget http://www.leptonica.org/source/leptonica-1.76.0.tar.gz
解压:tar-zxvf leptonica-1.76.0.tar.gz
安装:cd leptonica-1.76.0
./configure
make && make install
第三步:下载安装Tesseract-OCR
下载:wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.3.tar.gz
解压:tar-zxvf tesseract-4.0.0-beta.3.tar.gz
安装:./autoconf
提示错误信息"Missing autoconf-archive. Check the build requirements"
解决办法:yum install autoconf-archive
./configure
提示错误信息"error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package"
解决办法:参考csdn地址-https://blog.csdn.net/xjmxym/article/details/79040514
按照上述文档操作之后,执行:
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/lib
make && make install
第四步:检测Tesseract-OCR 支持的语言
切换Tesseract-OCR 指令安装目录:/usr/local/bin/tesseract --list-langs
github下载全套tessdata_fast并上传至/usr/local/share/文件夹下,将tessdata_fast改名为tessdata,(建议下载需要的语言包:eng.traineddata、chi_sim.traineddata)
第五步:Tesseract-OCR 识别指定图形文件,将识别结果输入到指定文件中
执行如下指令:/usr/local/bin/tesseract 识别图像路径 识别结果输出地址 -l chi_sim
Demo: /usr/local/bin/tesseract /ftp/pub/0002-0001.jpg /ftp/pub/1 -l chi_sim