Scalability influence on retrieval models: an experimental methodology

  • Authors:
  • Amélie Imafouo;Michel Beigbeder

  • Affiliations:
  • Ecole Nationale Supérieure des Mines de Saint-Etienne, Saint-Etienne, Cedex 2, France;Ecole Nationale Supérieure des Mines de Saint-Etienne, Saint-Etienne, Cedex 2, France

  • Venue:
  • ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Few works in Information Retrieval (IR) tackled the questions of Information Retrieval Systems (IRS) effectiveness and efficiency in the context of scalability in corpus size. We propose a general experimental methodology to study the scalability influence on IR models. This methodology is based on the construction of a collection on which a given characteristic C is the same whatever be the portion of collection selected. This new collection called uniform can be split into sub-collection of growing size on which some given properties will be studied. We apply our methodology to WT10G (TREC9 collection) and consider the characteristic C to be the distribution of relevant documents on a collection. We build a uniform WT10G, sample it into sub-collections of increasing size and use these sub-collections to study the impact of corpus volume increase on standards IRS evaluation measures (recall/precision, high precision).