A generalised framework for script identification

  • Authors:
  • Gopal Datt Joshi;Saurabh Garg;Jayanthi Sivaswamy

  • Affiliations:
  • International Institute of Information Technology, Centre for Visual Information Technology, Gachibowli, 500 032, Hyderabad, Andhra Pradesh, India;International Institute of Information Technology, Centre for Visual Information Technology, Gachibowli, 500 032, Hyderabad, Andhra Pradesh, India;International Institute of Information Technology, Centre for Visual Information Technology, Gachibowli, 500 032, Hyderabad, Andhra Pradesh, India

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script-specific OCR in a multi-lingual environment. In this paper, we model script identification as a texture classification problem and examine a global approach inspired by human visual perception. A generalised, hierarchical framework is proposed for script identification. A set of energy and intensity space features for this task is also presented. The framework serves to establish the utility of a global approach to the classification of scripts. The framework has been tested on two datasets: 10 Indian and 13 world scripts. The obtained accuracy of identification across the two datasets is above 94%. The results demonstrate that the framework can be used to develop solutions for script identification from document images across a large set of script classes.