The use of phrases and structured queries in information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Biterm language models for document retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Linear feature-based models for information retrieval
Information Retrieval
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
ChatNoir: a search engine for the ClueWeb09 corpus
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.