Managing short postings lists

  • Authors:
  • Andrew Trotman;Xiang-Fei Jia;Matt Crane

  • Affiliations:
  • University of Otago, Dunedin, New Zealand;University of Otago, Dunedin, New Zealand;University of Otago, Dunedin, New Zealand

  • Venue:
  • Proceedings of the 18th Australasian Document Computing Symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous work has examined space saving and throughput increasing techniques for long postings lists in an inverted file search engine. In this contribution we show that highly sporadic terms (terms that occur in 1 or 2 documents) are a high proportion of the unique terms in the collection and that these terms are seen in queries. The previously known space saving method of storing their short postings lists in the vocabulary is compared to storing in the postings file. We quantify the saving as about 6.5%, with no loss in precision, and suggest the adoption of this technique.