Document Image Retrieval through Word Shape Coding

  • Authors:
  • Shijian Lu;Linlin Li;Chew Lim Tan

  • Affiliations:
  • A*STAR, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.14

Visualization

Abstract

This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.