Incremental updates of inverted lists for text document retrieval

  • Authors:
  • Anthony Tomasic;Héctor García-Molina;Kurt Shoens

  • Affiliations:
  • Stanford University, Department of Computer Science, Stanford, CA;Stanford University, Department of Computer Science, Stanford, CA;IBM Almaden Research Center, 650 Harry Road San Jose, CA

  • Venue:
  • SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.