Automatic document orientation detection and categorization through document vectorization

  • Authors:
  • Shijian Lu;Chew Lim Tan

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an automatic orientation detection and categorization technique that is capable of detecting the orientation of multilingual documents with arbitrary skew and categorizing document images according to the underlying languages. We carry out orientation detection and categorization through document vectorization, which encodes document orientation and language information and converts each document image into an electronic document vector through the exploitation of the density and distribution of vertical component runs. For each language of interest, a pair of vector templates is first constructed through a training process. Orientation and category of the query image are then determined based on distances between the query document vector and the constructed vector templates. Experiments over 492 testing document images show that the average orientation detection and categorization rates reach up to 97.56% and 99.59%, respectively.