Summarization of compressed text images: an experience on Indic script documents

  • Authors:
  • Utpal Garain

  • Affiliations:
  • Indian Statistical Institute, Kolkata, India

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic summarization of JBIG2 coded textual images is discussed. Compressed images are partially decompressed to compute relevant features. The feature extraction method is free from using any character recognition module. Summary sentences are ranked. Experiment considers documents in Indic scripts that lack in having any efficient OCR systems. Script independent aspect of the approach is highlighted through use of two most popular Indic scripts. Sentence selection efficiency of about 61% is achieved when judged against man-made summarization. A nonparametric (distribution-free) rank statistic shows a correlation coefficient of 0.33 as a measure of the (minimum) strength of the associations between sentence ranking by machine and human.