Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search

Authors:
Yukio Uematsu;Takafumi Inoue;Kengo Fujioka;Ryoji Kataoka;Hayato Ohwada
Affiliations:
NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan and Tokyo University of Science, Chiba, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Space Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;Tokyo University of Science, Chiba, Japan
Venue:
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Year:
2008

Citing 11
Cited 0

Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Lucene in Action (In Action series)

Lucene in Action (In Action series)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Improving web retrieval precision based on semantic relationships and proximity of query keywords

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a search method that uses sentence-based inverted indexes to achieve high accuracy at practical speeds. The proposed method well supports the vast majority of queries entered on the web; these queries contain single words, multiple words for proximity searches, and semantically direct phrases. The existing approach, the inverted index which holds word-level position data is not efficient, because the size of index becomes extremely large. Our solution is to drop the word position data and index only the existence of each word in each sentence. We incorporate the sentence-based inverted index into a commercial search engine and evaluate it using both Japanese and English standard IR corpuses. The experiment shows that our method offers high accuracy, while index size and search processing time are greatly reduced.