Recognition strategies for general handwritten text documents

Authors:
M. Shridhar;G. F. Houle;F. Kimura
Affiliations:
(Correspd. E-mail: mals@umich.edu) University of Michigan-Dearborn, MI, USA;Kappa Image LLC Oakland, CA, USA;Mie University, Tsu City, Japan
Venue:
Integrated Computer-Aided Engineering
Year:
2009

Citing 3
Cited 2

Handwritten numerical recognition based on multiple algorithms

Pattern Recognition
A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic scoring of short handwritten essays in reading comprehension tests

Artificial Intelligence

Automatic line and word segmentation applied to densely line-skewed historical handwritten document images

Integrated Computer-Aided Engineering
A new thresholding algorithm for document images based on the perception of objects by distance

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents document recognition strategies for two important applications: 1) Recognition of text document containing multiple lines of text data and 2). Comprehensive Check Image Reader This paper describes the challenges in finding and recognizing the fields of interest on the broad document types. A project to study the feasibility of recognizing essays written by middle school students is the focus of this study. In this project, a scanned document is processed to extract individual lines of text from the essay, extract individual words from the line and then apply word recognition techniques to the extracted words. While individual lines of data are extracted accurately using gap information between lines, extraction of words is a much bigger challenge. Since the essays are written by middle school children, word boundaries are ambiguous, especially when words are written in a non-cursive discrete style. In these cases the gaps between words are sometimes smaller than the gaps between characters of the word causing errors in estimating the location of word boundaries. In the second application we treat a bank check as a complete document that has a regular structure and different fields of interest that need to be extracted and recognized. The key challenges are accurate extraction of the different fields followed by accurate recognition of the data in those fields. Many commercial banks have deployed automatic check processing with great success.