Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
ACM Computing Surveys (CSUR)
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Lucene in Action (In Action series)
Lucene in Action (In Action series)
Inverted files for text search engines
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Improving web retrieval precision based on semantic relationships and proximity of query keywords
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
We propose a search method that uses sentence-based inverted indexes to achieve high accuracy at practical speeds. The proposed method well supports the vast majority of queries entered on the web; these queries contain single words, multiple words for proximity searches, and semantically direct phrases. The existing approach, the inverted index which holds word-level position data is not efficient, because the size of index becomes extremely large. Our solution is to drop the word position data and index only the existence of each word in each sentence. We incorporate the sentence-based inverted index into a commercial search engine and evaluate it using both Japanese and English standard IR corpuses. The experiment shows that our method offers high accuracy, while index size and search processing time are greatly reduced.