Curvature feature distribution based classification of Indian scripts from document images

  • Authors:
  • Gaurav Sharma;Ritu Garg;Santanu Chaudhury

  • Affiliations:
  • Indian Institute of Technology, Delhi;Indian Institute of Technology, Delhi;Indian Institute of Technology, Delhi

  • Venue:
  • Proceedings of the International Workshop on Multilingual OCR
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a framework for classification of text document images based on their script. We deal with the domain of Indian scripts which has high inter script similarities. Indian scripts have characteristic curvature distributions which help in visual discrimination of scripts. We use edge direction based features to capture the distribution of curvature. We also use a recently proposed feature selection algorithm to obtain the most discriminating curvature features. We form hierarchy (automatically) based on statistical distances between the script models. Hierarchy allows us to group similar scripts at one level and then focus on the classification between the similar scripts at the next level leading to improvement in accuracy. We show experiments and results on a large set of about 3400 images.