Text Identification in Noisy Document Images Using Markov Random Field

  • Authors:
  • Yefeng Zheng;Huiping Li;David Doermann

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of the identificationof text from noisy documents. We segment and identifyhandwriting from machine printed text because 1) handwritingin a document often indicates corrections, additionsor other supplemental information that should be treateddifferently from the main or body content, and 2) the segmentationand recognition techniques for machine printedtext and handwriting are significantly different. Our noveltyis that we treat noise as a separate class and model noisebased on selected features. Trained Fisher classifiers areused to identify machine printed text and handwriting fromnoise. We further exploit context to refine the classification.A Markov Random Field (MRF) based approach is used tomodel the geometrical structure of the printed text, handwritingand noise to rectify the mis-classification. Experimentalresults show our approach is promising and robust,and can significantly improve the page segmentation resultsin noise documents.