Alternatives to Bpref

Authors:
Tetsuya Sakai
Affiliations:
NewsWatch, Inc. (current affiliation)/Toshiba Corporate R&D Center
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 15
Cited 26

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic test collections: measuring search effectiveness on the live web

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Rpref: a generalization of Bpref towards graded relevance judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On the reliability of information retrieval metrics based on graded relevance

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia

On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
Retrieval sensitivity under training using different measures

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A simple and efficient sampling method for estimating AP and NDCG

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Statistical power in retrieval experimentation

Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
If I Had a Million Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
On rank correlation and the distance between rankings

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Score adjustment for correction of pooling bias

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Weighted Rank Correlation in Information Retrieval Evaluation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Measuring the reusability of test collections

Proceedings of the third ACM international conference on Web search and data mining
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Towards better evaluation for human language technology

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Wisdom of the ages: toward delivering the children's web with the link-based agerank algorithm

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluating multi-query sessions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Click the search button and be happy: evaluating direct and immediate information access

Proceedings of the 20th ACM international conference on Information and knowledge management
A novel TF-IDF weighting scheme for effective ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Predicting the impact of expansion terms using semantic and user interaction features

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A new statistical strategy for pooling: ELI

Information Processing Letters
Choices in batch information retrieval evaluation

Proceedings of the 18th Australasian Document Computing Symposium
Adaptive signature-based semantic selection of services with OWLS-MX3

Multiagent and Grid Systems - Development of service-based and agent-based computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power.