The significance of the Cranfield tests on index languages
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Improving automatic query expansion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Parsimonious language models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A multi-system analysis of document and term selection for blind feedback
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance information: a loss of entropy but a gain for IDF?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Word sense disambiguation in queries
Proceedings of the 14th ACM international conference on Information and knowledge management
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Semantic term matching in axiomatic approaches to information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Recognition and classification of noun phrases in queries for effective retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Has adhoc retrieval improved since 1994?
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Evaluating topic models for digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
A study of information retrieval weighting schemes for sentiment analysis
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Presenting query aspects to support exploratory search
AUIC '10 Proceedings of the Eleventh Australasian Conference on User Interface - Volume 106
IR between science and engineering, and the role of experimentation
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Automated component-level evaluation: present and future
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Concept-Based Information Retrieval Using Explicit Semantic Analysis
ACM Transactions on Information Systems (TOIS)
Latent semantic indexing (LSI) fails for TREC collections
ACM SIGKDD Explorations Newsletter
The limits of retrieval effectiveness
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Ad hoc IR: not much room for improvement
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A tool for comparative IR evaluation on component level
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Towards a taxonomy of syntactic and semantic matching mechanisms for aspect-oriented modeling
SAM'10 Proceedings of the 6th international conference on System analysis and modeling: about models
Comparative information retrieval evaluation for scanned documents
Proceedings of the 15th WSEAS international conference on Computers
Quantifying the impact of concept recognition on biomedical information retrieval
Information Processing and Management: an International Journal
Principles for robust evaluation infrastructure
Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation
IR research: systems, interaction, evaluation and theories
ACM SIGIR Forum
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
Salton award lecture: information retrieval as engineering science
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A hybrid model for ad-hoc information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Ousting ivory tower research: towards a web framework for providing experiments as a service
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Experimental methods for information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
DIRECTions: design and specification of an IR evaluation infrastructure
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
An attempt to measure the quality of questions in question time of the Australian Federal Parliament
Proceedings of the Seventeenth Australasian Document Computing Symposium
Panel on use of proprietary data
ACM SIGIR Forum
Fuhr's challenge: conceptual research, or bust
ACM SIGIR Forum
Proceedings of the 18th Australasian Document Computing Symposium
A study of supervised term weighting scheme for sentiment analysis
Expert Systems with Applications: An International Journal
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
The existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups and over time. Such comparisons, however, are rarely made. Most researchers only report results from their own experiments, a practice that allows lack of overall improvement to go unnoticed. In this paper, we analyze results achieved on the TREC Ad-Hoc, Web, Terabyte, and Robust collections as reported in SIGIR (1998--2008) and CIKM (2004--2008). Dozens of individual published experiments report effectiveness improvements, and often claim statistical significance. However, there is little evidence of improvement in ad-hoc retrieval technology over the past decade. Baselines are generally weak, often being below the median original TREC system. And in only a handful of experiments is the score of the best TREC automatic run exceeded. Given this finding, we question the value of achieving even a statistically significant result over a weak baseline. We propose that the community adopt a practice of regular longitudinal comparison to ensure measurable progress, or at least prevent the lack of it from going unnoticed. We describe an online database of retrieval runs that facilitates such a practice.