Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search

  • Authors:
  • Yukio Uematsu;Takafumi Inoue;Kengo Fujioka;Ryoji Kataoka;Hayato Ohwada

  • Affiliations:
  • NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan and Tokyo University of Science, Chiba, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Space Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;Tokyo University of Science, Chiba, Japan

  • Venue:
  • ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a search method that uses sentence-based inverted indexes to achieve high accuracy at practical speeds. The proposed method well supports the vast majority of queries entered on the web; these queries contain single words, multiple words for proximity searches, and semantically direct phrases. The existing approach, the inverted index which holds word-level position data is not efficient, because the size of index becomes extremely large. Our solution is to drop the word position data and index only the existence of each word in each sentence. We incorporate the sentence-based inverted index into a commercial search engine and evaluate it using both Japanese and English standard IR corpuses. The experiment shows that our method offers high accuracy, while index size and search processing time are greatly reduced.