Extraction of text areas in printed document images

  • Authors:
  • Jean Duong;Myriam Côte;Hubert Emptoz;Ching Y. Suen

  • Affiliations:
  • Ecole de Technologie Superieure (ETS), Montréal, Quebec, Canada;Ecole de Technologie Superieure (ETS), Montréal, Quebec, Canada;Institut National des Sciences Appliquees (INSA) de Lyon, Villeurbanne Cedex, France;Concordia University, Montréal, Quebec, Canada

  • Venue:
  • DocEng '01 Proceedings of the 2001 ACM Symposium on Document engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a document analysis system which is expected to extract regions of interest in greyscale document images. Collected areas are then clustered in text zones and non-text areas using geometric and texture features. The system works in two steps. Regions of interest are retrieved via cumulative gradient considerations. In classification module, we introduced some entropic heuristic. Experiments are done on the MediaTeam Document Database to show the relevance of this criteria.