An indexed full-text search method of printed document images with an M-tree

  • Authors:
  • Hajime Imura;Yuzuru Tanaka

  • Affiliations:
  • Technology Hokkaido University, Sapporo, Japan;Technology Hokkaido University, Sapporo, Japan

  • Venue:
  • RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an indexed full-text search method of printed document images for the occurrences of a specified character string image. It is based on N-gram-based indexing with an M-tree index structure. It is important to facilitate a full-text search method of historical letterpress printing collections to be able to deal with them. The proposed full-text search method is independent of difference of languages and fonts because it uses a pseudo-coding scheme that is based on the statistical features of character shapes. Conventional Word Spotting methods need a sequential scan of the whole document image and a matching calculation of the whole descriptor sequence of a document. The proposed N-gram-based indexing method accelerates the search process with an M-tree. Our method was evaluated in terms of its search time and of recall-precision curve for N-gram-based query strings. Our experiments demonstrated that the proposed approach achieves search times that are one hundred times faster improvement about search time.