tesseract装置配置
装置依赖
brew install automake autoconf libtoolbrew install pkgconfigbrew install icu4cbrew install leptonica# Packages required for training tools.brew install pango# Optional packages for extra features.brew install libarchive# Optional package for builds using g++.brew install gcc
下载解压tesseract
tesseract
编译装置
cd tesseract-4.1.1./autogen.shmkdir buildcd build# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfigmake -j# Optionally install Tesseract.sudo make install# Optionally build and install training tools.make trainingsudo make training-install
下载eng.traineddata
eng.traineddata
这里只有下载其中的eng.traineddata就行了,如果须要其余的语言则按需下载,不须要全副都下载了,全副下载的话3g左右,比拟大。
测试
$ tesseract 0384.jpg stdout0 3 8 4
看报错门路,把eng.traineddata文件拷贝到缺失门路下,再次测试
pytesseract应用
参考
依赖包装置
pip install pytesseract
导入应用
import pytesseract as ptfrom PIL import Imageimage = Image.open('0384.jpeg')text = pt.image_to_string(image)print(text)