Scalability influence on retrieval models: an experimental methodology

Authors:
Amélie Imafouo;Michel Beigbeder
Affiliations:
Ecole Nationale Supérieure des Mines de Saint-Etienne, Saint-Etienne, Cedex 2, France;Ecole Nationale Supérieure des Mines de Saint-Etienne, Saint-Etienne, Cedex 2, France
Venue:
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Year:
2005

Citing 12
Cited 0

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Relevance ranking for one to three term queries

Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Scaling Up the TREC Collection

Information Retrieval
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On Collection Size and Retrieval Effectiveness

Information Retrieval
On Scalable Information Retrieval Systems

NCA '03 Proceedings of the Second IEEE International Symposium on Network Computing and Applications
Replicating Web Structure in Small-Scale Test Collections

Information Retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Few works in Information Retrieval (IR) tackled the questions of Information Retrieval Systems (IRS) effectiveness and efficiency in the context of scalability in corpus size. We propose a general experimental methodology to study the scalability influence on IR models. This methodology is based on the construction of a collection on which a given characteristic C is the same whatever be the portion of collection selected. This new collection called uniform can be split into sub-collection of growing size on which some given properties will be studied. We apply our methodology to WT10G (TREC9 collection) and consider the characteristic C to be the distribution of relevant documents on a collection. We build a uniform WT10G, sample it into sub-collections of increasing size and use these sub-collections to study the impact of corpus volume increase on standards IRS evaluation measures (recall/precision, high precision).