On Segmentation of Documents in Complex Scripts

  • Authors:
  • K. S. Kumar;S. Kumar;C. Jawahar

  • Affiliations:
  • International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India

  • Venue:
  • ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex lay- outs. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful al- gorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differ- ences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmenta- tion needs to be enhanced with other information like script models for accurate results.