Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic test collections: measuring search effectiveness on the live web
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Rpref: a generalization of Bpref towards graded relevance judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On the reliability of information retrieval metrics based on graded relevance
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Retrieval sensitivity under training using different measures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A simple and efficient sampling method for estimating AP and NDCG
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Statistical power in retrieval experimentation
Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
On rank correlation and the distance between rankings
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Score adjustment for correction of pooling bias
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Building a framework for the probability ranking principle by a family of expected weighted rank
ACM Transactions on Information Systems (TOIS)
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Weighted Rank Correlation in Information Retrieval Evaluation
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Measuring the reusability of test collections
Proceedings of the third ACM international conference on Web search and data mining
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Towards better evaluation for human language technology
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Wisdom of the ages: toward delivering the children's web with the link-based agerank algorithm
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluating multi-query sessions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Click the search button and be happy: evaluating direct and immediate information access
Proceedings of the 20th ACM international conference on Information and knowledge management
A novel TF-IDF weighting scheme for effective ranking
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A mutual information-based framework for the analysis of information retrieval systems
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Predicting the impact of expansion terms using semantic and user interaction features
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A new statistical strategy for pooling: ELI
Information Processing Letters
Choices in batch information retrieval evaluation
Proceedings of the 18th Australasian Document Computing Symposium
Adaptive signature-based semantic selection of services with OWLS-MX3
Multiagent and Grid Systems - Development of service-based and agent-based computing systems
Hi-index | 0.00 |
Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power.