An Overview of the Tesseract OCR Engine

Authors:
R. Smith
Affiliations:
Google Inc.
Venue:
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Year:
2007

Citing 0
Cited 29

Applying the OCRopus OCR System to Scholarly Sanskrit Literature

Sanskrit Computational Linguistics
Adapting the Tesseract open source OCR engine for multilingual OCR

Proceedings of the International Workshop on Multilingual OCR
Recent progress on the OCRopus OCR system

Proceedings of the International Workshop on Multilingual OCR
Language independent thresholding optimization using a Gaussian mixture modelling of the character shapes

Proceedings of the International Workshop on Multilingual OCR
Combined script and page orientation estimation using the Tesseract OCR engine

Proceedings of the International Workshop on Multilingual OCR
Robust pre-processing techniques for OCR applications on mobile devices

Mobility '09 Proceedings of the 6th International Conference on Mobile Technology, Application & Systems
Improving OCR accuracy for classical critical editions

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Table detection in heterogeneous documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Recognition driven page orientation detection

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Building book inventories using smartphones

Proceedings of the international conference on Multimedia
Design, development and performance evaluation of reconfigured mobile Android phone for people who are blind or visually impaired

Proceedings of the 28th ACM International Conference on Design of Communication
Enhancement of historical printed document images by combining Total Variation regularization and Non-local Means filtering

Image and Vision Computing
Categorization of display ads using image and landing page features

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Text detection on charts and graphs

Pattern Recognition and Image Analysis
An experimental workflow development platform for historical document digitisation and analysis

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Ocropodium: open source OCR for small-scale historical archives

Journal of Information Science
Taming wild behavior: the input observer for obtaining text entry and mouse pointing measures from everyday computer use

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
An effective partition approach for elastic application development on mobile cloud computing

GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
For human eyes only: security and usability evaluation

Proceedings of the 2012 ACM workshop on Privacy in the electronic society
GAS meter reading from real world images using a multi-net system

Pattern Recognition Letters
Computing precision and recall with missing or uncertain ground truth

GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges
Scene text recognition and tracking to identify athletes in sport videos

Multimedia Tools and Applications
Semantic-Feature-Based Object Recognition by Using Internet Data Mining

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Text detection in chart images

Pattern Recognition and Image Analysis
ScreenPass: secure password entry on touchscreen devices

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
Early modern OCR project (eMOP) at Texas A&M University: using Aletheia to train Tesseract

Proceedings of the 2013 ACM symposium on Document engineering
Can we build language-independent OCR using LSTM networks?

Proceedings of the 4th International Workshop on Multilingual OCR
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR
A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics

Autonomous Robots

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy[1], is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.