Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Relevance ranking for one to three term queries
Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Scaling Up the TREC Collection
Information Retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On Collection Size and Retrieval Effectiveness
Information Retrieval
On Scalable Information Retrieval Systems
NCA '03 Proceedings of the Second IEEE International Symposium on Network Computing and Applications
Replicating Web Structure in Small-Scale Test Collections
Information Retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for keyword-based retrieval systems
ECIR'03 Proceedings of the 25th European conference on IR research
Hi-index | 0.00 |
Few works in Information Retrieval (IR) tackled the questions of Information Retrieval Systems (IRS) effectiveness and efficiency in the context of scalability in corpus size. We propose a general experimental methodology to study the scalability influence on IR models. This methodology is based on the construction of a collection on which a given characteristic C is the same whatever be the portion of collection selected. This new collection called uniform can be split into sub-collection of growing size on which some given properties will be studied. We apply our methodology to WT10G (TREC9 collection) and consider the characteristic C to be the distribution of relevant documents on a collection. We build a uniform WT10G, sample it into sub-collections of increasing size and use these sub-collections to study the impact of corpus volume increase on standards IRS evaluation measures (recall/precision, high precision).