Content-Based Indexing and Retrieval Method of Chinese Document Images

  • Authors:
  • Yaodong He;Zao Jiang;Bing Liu;Hong Zhao

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is Chinese document image (non-ASCII file), we may first convert the document image into text file by using Chinese optical character recognition (OCR) technology, and then index the document by using information retrieval algorithm. However, OCR needs longer time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate stroke density of each Chinese character image, and at last attain stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers retrieval method for Chinese document image based on the index technology. We specially discuss index and retrieval method for duplicate detection. We have validated the validity of the index method through its application to keyword spotting and duplicate detection.