N-Tuple Features for OCR Revisited

Authors:
Dz-Mou Jung;M. S. Krishnamoorthy;George Nagy;Andrew Shapira
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1996

Citing 4
Cited 9

Principles of artificial intelligence

Principles of artificial intelligence
Joint feature and classifier design for OCR based on a small training set

Joint feature and classifier design for OCR based on a small training set
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Efficiency of Parallel Backtracking

IEEE Transactions on Parallel and Distributed Systems

Theoretical Analysis and Improved Decision Criteria for the n-Tuple Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving the Clustering Performance of the Scanning n-Tuple Method by Using Self-Supervised Algorithms to Introduce Subclasses

IEEE Transactions on Pattern Analysis and Machine Intelligence
Locating and Recognizing Text in WWW Images

Information Retrieval
An Old Greek Handwritten OCR System

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text Degradations and OCR Training

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
N-tuple Network, CART, and Bagging

Neural Computation
A system for processing and recognition of old Greek manuscripts the D-SCRIBE project

AIC'04 Proceedings of the 4th WSEAS International Conference on Applied Informatics and Communications
A new pattern matching approach to the recognition of printed Arabic

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Quantified Score

Hi-index	0.14

Visualization

Abstract

N-tuple features for optical character recognition have received only scattered attention since the 1960s. Our main purpose here is to show that advances in computer technology and computer science compel renewed interest. N-tuple features are useful for printed character classification because they indicate the presence or absence of a given rigid configuration of n black and white pixels in a pattern. Desirable n-tuples fit each pattern of a specified (positive) training set of characters in at least p different shift positions, and fail to fit each pattern of a specified (negative) training set by at least n驴q pixels in each shift position. In this work we prove that the problem of finding a distinguishing n-tuple is NP-complete, by examining a natural subproblem with binary strings called the missing configuration problem. The NP-completeness result notwithstanding, distinguishing n-tuples are found automatically in a few seconds on contemporary workstations. We exhibit a practical search algorithm for generating, from a small training set, a collection of n-tuples with low class-conditional correlation and with specified design parameters n, p, and q. The generator, which is available on the Internet, is empirically shown to be effective through a comparison with a benchmark generator. We show experimentally that the design parameters provide a useful tradeoff between distinguishing power and generation time, and also between the conditional probabilities for the positive and negative classes. We explore the feature probabilities obtainable for various dichotomies, and show that the design parameters control the feature probabilities.