Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
Combining the evidence of multiple query representations for information retrieval
TREC-2 Proceedings of the second conference on Text retrieval conference
Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the performance of linearly combined IR systems
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A unified environment for fusion of information retrieval approaches
Proceedings of the eighth international conference on Information and knowledge management
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Data fusion with estimated weights
Proceedings of the eleventh international conference on Information and knowledge management
Fusion Via a Linear Combination of Scores
Information Retrieval
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Forming test collections with no system pooling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Generative model-based metasearch for data fusion in information retrieval
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Relying on topic subsets for system ranking estimation
Proceedings of the 18th ACM conference on Information and knowledge management
On the Selection of the Best Retrieval Result Per Query ---An Alternative Approach to Data Fusion---
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Using clustering to improve retrieval evaluation without relevance judgments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
IR system evaluation using nugget-based test collections
Proceedings of the fifth ACM international conference on Web search and data mining
A case for automatic system evaluation
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hi-index | 0.00 |
This paper addresses the problem of how to rank retrieval systems without the need for human relevance judgments, which are very resource intensive to obtain. Using TREC 3, 6, 7 and 8 data, it is shown how the overlap structure between the search results of multiple systems can be used to infer relative performance differences. In particular, the overlap structures for random groupings of five systems are computed, so that each system is selected an equal number of times. It is shown that the average percentage of a system's documents that are only found by it and no other systems is strongly and negatively correlated with its retrieval performance effectiveness, such as its mean average precision or precision at 1000. The presented method uses the degree of consensus or agreement a retrieval system can generate to infer its quality. This paper also addresses the question of how many documents in a ranked list need to be examined to be able to rank the systems. It is shown that the overlap structure of the top 50 documents can be used to rank the systems, often producing the best results. The presented method significantly improves upon previous attempts to rank retrieval systems without the need for human relevance judgments. This ''structure of overlap'' method can be of value to communities that need to identify the best experts or rank them, but do not have the resources to evaluate the experts' recommendations, since it does not require knowledge about the domain being searched or the information being requested.