{"id":20792,"date":"2021-08-26T22:10:26","date_gmt":"2021-08-26T15:10:26","guid":{"rendered":"https:\/\/www.yamagiwa2000.com\/blog\/?p=20792"},"modified":"2021-08-26T22:10:26","modified_gmt":"2021-08-26T15:10:26","slug":"tesseract-pyocr-2021","status":"publish","type":"post","link":"https:\/\/www.yamagiwa2000.com\/blog\/?p=20792","title":{"rendered":"Tesseract\u3068Pyocr\u306e\u7d44\u307f\u5408\u308f\u305b\u304c\u3046\u307e\u304f\u52d5\u304b\u306a\u3044\uff1f 2021\u5e7408\u670825\u65e5"},"content":{"rendered":"<p><a data-flickr-embed=\"true\" data-footer=\"true\" href=\"https:\/\/www.flickr.com\/photos\/code_life\/34183849751\/in\/photolist-U5HazR-rSDV4m-x9G5bQ-24zNZKJ-24zP2wQ-xoZ2Vf-2m3oeuQ-2m3fEiG-6YdFvY-36M6g8-UskpPj-dGEsGR-35pL2P-7AvpDf-HcuTTe-3mJp1x-Akhkx-akfhms-CrLB9-tqDDYJ-ny7Cpp-8MDRhM-6vFjHH-qDw9Xy-DzKJ2-2m1VVqi-UuwgWj-x1SN4-6xDj9Q-nZdL2E-2m2b2Va-2m22ozn-Fubh2-zjC23-EFujF-ajzNez-7a6Nj6-2uaErh-9haeq4-8R12Ky-2iTFKkK-6dbp71-4TkWEe-6XcP4h-53fzWE-2jrYdY8-2iNdX6j-2j91536-cHbuHh-SZUVzy\" title=\"Code\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/live.staticflickr.com\/2824\/34183849751_74e22385fd_b.jpg\" width=\"1024\" height=\"576\" alt=\"Code\" \/><\/a><script async=\"\" src=\"\/\/embedr.flickr.com\/assets\/client-code.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Google\u306e\u4f5c\u3063\u305f\u753b\u50cf\u8a8d\u8b58\u304c\u5f97\u610f\u306aTesseract\u3068\u3044\u3046\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u304c\u3042\u3063\u3066\u3001\u305d\u308c\u3092Python\u304b\u3089\u4f7f\u3046\u305f\u3081\u306b\u9593\u3092\u53d6\u308a\u6301\u3064\u305f\u3081\u306e\u30e9\u30c3\u30d1\u30fc\u3068\u3044\u3046\u5f79\u5272\u3067Pyocr\u3068\u3044\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\u304c\u3042\u308b\u3002\u3053\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u4f7f\u3048\u3070Python\u304b\u3089\u7c21\u5358\u306b\u753b\u50cf\u8a8d\u8b58\u3001\u753b\u50cf\u304b\u3089\u6587\u5b57\u5217\u3092\u629c\u304d\u51fa\u3059\u306a\u3069\u3044\u308d\u3044\u308d\u3068\u4fbf\u5229\u306b\u306a\u308b\u3002<\/p>\n<p>\u306e\u3060\u304c\u3001\u3061\u3087\u3063\u3068\u4eca\u3053\u306eTesseract\u3068Pyocr\u3092CentOS7.0\u4e0a\u306b\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u3066\u4f7f\u3063\u3066\u3044\u3066\u3001\u306a\u3093\u3067\u3060\u304b\u30a8\u30e9\u30fc\u304c\u51fa\u308b\u3002<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-bash\" data-lang=\"Bash\"><code>[root@c7 lott]# python extract_number.py\r\ntesseract 3.04.00\r\nleptonica-1.72\r\nlibgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0\r\n\r\npsm_parameter(): failed to get Tesseract version. AssumingTesseract &gt;= 4 --&gt; using option '--psm'\r\nTraceback (most recent call last):\r\nFile \"\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py\", line 454, in get_version\r\nver_string = ver_string.split(\" \")[1]\r\nIndexError: list index out of range\r\n\r\nDuring handling of the above exception, another exception occurred:\r\n\r\nTraceback (most recent call last):\r\nFile \"\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py\", line 168, in psm_parameter\r\nversion = get_version()\r\nFile \"\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py\", line 471, in get_version\r\nraise TesseractError(\r\npyocr.error.TesseractError: (0, 'Unable to parse Tesseract version (spliting failed): []')\r\nNo. of data: NG&gt; prize1 0<\/code><\/pre>\n<\/div>\n<p>\u3069\u3046\u3084\u3089tesseract.py\u306eget_version()\u3068\u3044\u3046\u95a2\u6570\u3067\u30a8\u30e9\u30fc\u304c\u51fa\u3066\u5931\u6557\u3057\u3066\u308b\u3063\u307d\u3044\u304c\u3001\u3055\u3089\u306b\u8aad\u3093\u3067\u3044\u304f\u3068<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>def get_version(set_env=True):\r\n\r\nglobal g_version\r\n\r\n\uff5e \u7701\u7565 \uff5e\r\n\r\ncommand = [TESSERACT_CMD, \"-v\"]\r\n\r\nproc = subprocess.Popen(command,\r\nstartupinfo=g_subprocess_startup_info,\r\ncreationflags=g_creation_flags,\r\nstdout=subprocess.PIPE)\r\nver_string = proc.stdout.read()\r\nver_string = ver_string.decode('utf-8')<\/code><\/pre>\n<\/div>\n<p>command\u3068\u3044\u3046\u30ea\u30b9\u30c8\u306b[&#8220;tesseract&#8221;, &#8220;-v&#8221;]\u3068\u3044\u3046\u30b3\u30de\u30f3\u30c9\u3068\u30aa\u30d7\u30b7\u30e7\u30f3\u3092\u683c\u7d0d\u3057\u3066\u6b21\u306esubprocess.Popen\u306b\u6e21\u3059\u3001Popen\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u304b\u3089read()\u3067\u6a19\u6e96\u51fa\u529b\u3092\u8aad\u307f\u53d6\u3063\u3066\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u691c\u8a3c\u3059\u308b\u6d41\u308c\u3060\u304c\u3001\u3053\u306ePopen\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u306e\u6a19\u6e96\u51fa\u529b\u306b\u4f55\u3082\u51fa\u529b\u3055\u308c\u3066\u3044\u306a\u3044\u3002\u9069\u5f53\u306a\u3068\u3053\u308d\u3067 print(ver_string) \u3067\u4f55\u304c\u5165\u3063\u3066\u3044\u308b\u306e\u304b\u898b\u3066\u307f\u305f\u3051\u3069b&#8221;\u3068\u304b&#8221;\u3068\u304b\u306a\u3093\u306b\u3082\u51fa\u529b\u3055\u308c\u306a\u3044\u306e\u3088\u306d\u3002\u3002\u3002<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>proc = subprocess.Popen(command,\r\nstartupinfo=g_subprocess_startup_info,\r\ncreationflags=g_creation_flags,\r\nstdout=subprocess.PIPE,\r\nstderr=subprocess.STDOUT)<\/code><\/pre>\n<\/div>\n<p>\u7d50\u5c40\u3001\u6a19\u6e96\u30a8\u30e9\u30fc\u51fa\u529b\u3092\u6a19\u6e96\u51fa\u529b\u306b\u30ea\u30c0\u30a4\u30ec\u30af\u30c8\u3059\u308b\u3068tesseract\u306e\u30d0\u30fc\u30b8\u30e7\u30f3\u60c5\u5831\u304c\u3084\u3063\u3068\u8aad\u307f\u53d6\u308c\u308b\u3088\u3046\u306b\u306a\u308b\u3093\u3060\u304c\u3001-v \u30aa\u30d7\u30b7\u30e7\u30f3\u3092\u4ed8\u3051\u3066\u308b\u306e\u306b\u306a\u3093\u3067\u6a19\u6e96\u30a8\u30e9\u30fc\u51fa\u529b\u306b\u30d0\u30fc\u30b8\u30e7\u30f3\u60c5\u5831\u304c\u51fa\u3066\u304f\u308b\u306e\uff1f\u4ed6\u306e\u30b3\u30de\u30f3\u30c9\u3067\u30c6\u30b9\u30c8\u3057\u3066\u307f\u308b\u3068\u3061\u3083\u3093\u3068\u30d0\u30fc\u30b8\u30e7\u30f3\u60c5\u5831\u306f\u6a19\u6e96\u51fa\u529b\u306b\u51fa\u3066\u304f\u308b\u306e\u306b\u3002\u3002\u3002<\/p>\n<p>\u3053\u308c\u3063\u3066tesseract\u304c\u554f\u984c\u306a\u306e\uff1f\u305d\u308c\u3068\u3082subprocess.Popen\u304c\u554f\u984c\u306a\u306e\uff1f\u3046\u30fc\u3093Popen\u306e\u30bd\u30fc\u30b9\u3082\u8aad\u307e\u306a\u3044\u3068\u3044\u3051\u306a\u3044\u306e\u304b\u3002\u3002\u3002\u3042\u3068\u4ed6\u306eCentOS7\u306e\u74b0\u5883\u306bTesseract\u3068Pyocr\u5165\u308c\u3066\u52d5\u304d\u4e00\u7dd2\u306a\u306e\u304b\u5909\u308f\u308b\u306e\u304b\u3082\u898b\u306a\u3044\u3068\u306a\u3002\u3002\u3002\u3059\u3093\u307e\u305b\u3093\u3001\u3053\u3053\u307e\u3067\u66f8\u3044\u3066\u3066\u4f55\u304c\u539f\u56e0\u304b\u307e\u3060\u308f\u304b\u308a\u307e\u305b\u3093\u3002\u3053\u3053\u307e\u3067\u898b\u3066\u304d\u305f\u3068\u3053\u308d\u3067\u300c\u3042\u3001\u30d6\u30ed\u30b0\u66f8\u304b\u306a\u304d\u3083\u300d\u3068\u601d\u3063\u305f\u306e\u3067\u982d\u306e\u4e2d\u306b\u3042\u3063\u305f\u3082\u306e\u3092\u66f8\u3044\u305f\u307e\u3067\u3002\u304a\u7c97\u672b\u69d8\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google\u306e\u4f5c\u3063\u305f\u753b\u50cf\u8a8d\u8b58\u304c\u5f97\u610f\u306aTesseract\u3068\u3044\u3046\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u304c\u3042\u3063\u3066\u3001\u305d\u308c\u3092Python\u304b\u3089\u4f7f\u3046\u305f\u3081\u306b\u9593\u3092\u53d6\u308a\u6301\u3064\u305f\u3081\u306e\u30e9\u30c3\u30d1\u30fc\u3068\u3044\u3046\u5f79\u5272\u3067Pyocr\u3068\u3044\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\u304c\u3042\u308b\u3002\u3053\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u4f7f\u3048\u3070Python\u304b\u3089\u7c21\u5358\u306b\u753b\u50cf\u8a8d\u8b58\u3001\u753b\u50cf\u304b\u3089\u6587\u5b57\u5217\u3092\u629c\u304d\u51fa\u3059\u306a\u3069\u3044\u308d\u3044\u308d\u3068\u4fbf\u5229\u306b\u306a\u308b\u3002 \u306e\u3060\u304c\u3001\u3061\u3087\u3063\u3068\u4eca\u3053\u306eTesseract\u3068Pyocr\u3092CentOS7.0\u4e0a\u306b\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u3066\u4f7f\u3063\u3066\u3044\u3066\u3001\u306a\u3093\u3067\u3060\u304b\u30a8\u30e9\u30fc\u304c\u51fa\u308b\u3002 [root@c7 lott]# python extract_number.py tesseract 3.04.00 leptonica-1.72 libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0 psm_parameter(): failed to get Tesseract version. AssumingTesseract &gt;= 4 &#8211;&gt; using option &#8216;&#8211;psm&#8217; Traceback (most recent call last): File &#8220;\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py&#8221;, line 454, in get_version ver_string = ver_string.split(&#8221; &#8220;)[1] IndexError: list index out of range During handling of the above exception, another exception occurred: Traceback (most recent call last): File &#8220;\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py&#8221;, line 168, in psm_parameter version = get_version() File &#8220;\/root\/.pyenv\/versions\/3.9.0\/lib\/python3.9\/site-packages\/pyocr\/tesseract.py&#8221;, line 471, in get_version raise TesseractError( pyocr.error.TesseractError: (0, &#8216;Unable to parse Tesseract version (spliting failed): []&#8217;) No. of data: NG&gt; prize1 0 \u3069\u3046\u3084\u3089tesseract.py\u306eget_version()\u3068\u3044\u3046\u95a2\u6570\u3067\u30a8\u30e9\u30fc\u304c\u51fa\u3066\u5931\u6557\u3057\u3066\u308b\u3063\u307d\u3044\u304c\u3001\u3055\u3089\u306b\u8aad\u3093\u3067\u3044\u304f\u3068 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[12011,10043,12014,356,12013,10042,12012,10044,12015,6591],"_links":{"self":[{"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/20792"}],"collection":[{"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=20792"}],"version-history":[{"count":4,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/20792\/revisions"}],"predecessor-version":[{"id":20796,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/20792\/revisions\/20796"}],"wp:attachment":[{"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=20792"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=20792"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.yamagiwa2000.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=20792"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}