Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Scaling IR-system evaluation using term relevance sets
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance prediction using spatial autocorrelation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions
ECIR'07 Proceedings of the 29th European conference on IR research
Retrieval system evaluation: automatic evaluation versus incomplete judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Aspects and analysis of patent test collections
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Exploring ant colony optimisation for adaptive interactive search
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Recent developments in information retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
An uncertainty-aware query selection model for evaluation of IR systems
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On Using Fewer Topics in Information Retrieval Evaluations
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Hi-index | 0.00 |
Ranking a set retrieval systems according to their retrieval effectiveness without relying on relevance judgments was first explored by Soboroff et al. [13]. Over the years, a number of alternative approaches have been proposed, all of which have been evaluated on early TREC test collections. In this work, we perform a wider analysis of system ranking estimation methods on sixteen TREC data sets which cover more tasks and corpora than previously. Our analysis reveals that the performance of system ranking estimation approaches varies across topics. This observation motivates the hypothesis that the performance of such methods can be improved by selecting the “right” subset of topics from a topic set. We show that using topic subsets improves the performance of automatic system ranking methods by 26% on average, with a maximum of 60%. We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation. These findings support the case for automatic system evaluation and motivate further research.