How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.07 |
Retrieval evaluation without relevance judgments is a hard but also very meaningful work. In this paper, we use clustering technique to improve the performance of judgment free retrieval evaluation. By using one system to represent all the systems that are similar to it, we can largely reduce the negative effect of similar retrieval results in Retrieval evaluation. Experimental results demonstrated that our method outperformed all the previous judgment free evaluation methods significantly. Its overall average performance outperformed the best previous result by 20.5%. Besides, our work is a general framework that can be applied to any other judgment free evaluation method for performance improvement.