Collection selection: ...now, with more documents!

  • Authors:
  • Diego Puppin

  • Affiliations:
  • -

  • Venue:
  • Proceedings of the 3rd international conference on Scalable information systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A way to reduce the computing pressure in a distributed IR system is to use document partitioning and to perform collection selection. With suitable training and/or modeling, the collection selection function can choose the most promising collections for each query, with high confidence. Unfortunately, if the collections need to be updated, we need to retrain the selection function, update its statistics or face the loss of some result quality. This paper introduces a simple, but very effective, technique to add new documents to collections in a system that uses collection selection. We show that we can update the individual collections, while guaranteeing the same selection performance, with no need to update or retrain the selection function.