Recognition strategies for general handwritten text documents

  • Authors:
  • M. Shridhar;G. F. Houle;F. Kimura

  • Affiliations:
  • (Correspd. E-mail: mals@umich.edu) University of Michigan-Dearborn, MI, USA;Kappa Image LLC Oakland, CA, USA;Mie University, Tsu City, Japan

  • Venue:
  • Integrated Computer-Aided Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents document recognition strategies for two important applications: 1) Recognition of text document containing multiple lines of text data and 2). Comprehensive Check Image Reader This paper describes the challenges in finding and recognizing the fields of interest on the broad document types. A project to study the feasibility of recognizing essays written by middle school students is the focus of this study. In this project, a scanned document is processed to extract individual lines of text from the essay, extract individual words from the line and then apply word recognition techniques to the extracted words. While individual lines of data are extracted accurately using gap information between lines, extraction of words is a much bigger challenge. Since the essays are written by middle school children, word boundaries are ambiguous, especially when words are written in a non-cursive discrete style. In these cases the gaps between words are sometimes smaller than the gaps between characters of the word causing errors in estimating the location of word boundaries. In the second application we treat a bank check as a complete document that has a regular structure and different fields of interest that need to be extracted and recognized. The key challenges are accurate extraction of the different fields followed by accurate recognition of the data in those fields. Many commercial banks have deployed automatic check processing with great success.