Revisiting globally sorted indexes for efficient document retrieval

  • Authors:
  • Fan Zhang;Shuming Shi;Hao Yan;Ji-Rong Wen

  • Affiliations:
  • Nankai University, Tianjin, China;Microsoft Research Asia, Beijing, China;Polytechnic Institute of New York University, New York, USA;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the third ACM international conference on Web search and data mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been a large amount of research on efficient document retrieval in both IR and web search areas. One important technique to improve retrieval efficiency is early termination, which speeds up query processing by avoiding scanning the entire inverted lists. Most early termination techniques first build new inverted indexes by sorting the inverted lists in the order of either the term-dependent information, e.g., term frequencies or term IR scores, or the term-independent information, e.g., static rank of the document; and then apply appropriate retrieval strategies on the resulting indexes. Although the methods based only on the static rank have been shown to be ineffective for the early termination, there are still many advantages of using the methods based on term-independent information. In this paper, we propose new techniques to organize inverted indexes based on the term-independent information beyond static rank and study the new retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Our results on the TREC GOV and GOV2 data sets show that our techniques can improve query efficiency significantly.