When close enough is good enough: approximate positional indexes for efficient ranked retrieval

  • Authors:
  • Tamer Elsayed;Jimmy Lin;Donald Metzler

  • Affiliations:
  • King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;University of Maryland, College Park, MD, USA;University of Southern California, Marina del Rey, CA, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.