Multifont OCR postprocessing system

Authors:
W. S. Rosenbaum;J. J. Hilliard
Affiliations:
IBM Federal Systems Division, Gaithersburg, Maryland;IBM Federal Systems Division, Gaithersburg, Maryland
Venue:
IBM Journal of Research and Development
Year:
1975

Citing 4
Cited 2

A technique for computer detection and correction of spelling errors

Communications of the ACM
The use of context for correcting garbled English text

ACM '64 Proceedings of the 1964 19th ACM national conference
Contextual Word Recognition Using Binary Digrams

IEEE Transactions on Computers
Techniques for replacing characters that are garbled on input

AFIPS '66 (Spring) Proceedings of the April 26-28, 1966, Spring joint computer conference

A Multifont Word Recognition System for Postal Address Reading

IEEE Transactions on Computers
Contextual Postprocessing System for Cooperation with a Multiple-Choice Character-Recognition System

IEEE Transactions on Computers

Quantified Score

Hi-index	0.01

Visualization

Abstract

A series of techniques is being developed to postprocess noisy, multifont, nonformatted OCR data on a word basis to 1) determine if a field is alphabetic or numeric; 2) verify that an alphabetic word is legitimate; 3) fetch from a dictionary a set of potential entries using a garbled word as a key; and 4) error-correct the garbled word by selecting the most likely dictionary word. Four algorithms were developed using a technique called vector processing (representing alphabetic words as numeric vectors) and also by applying Bayes maximum likelihoods olutions to correct the OCR output. The result was the development of a software simulator which processed sequential fields generated by the Advanced Optical Character Reader (in use by the U.S. Postal Service in New York City), performed the four functions indicated above, and selected the correct alphabetic word from a dictionary of 62000 entries.