IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
Flexible pseudo-relevance feedback via selective sampling
ACM Transactions on Asian Language Information Processing (TALIP)
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Extending average precision to graded relevance judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The reliability of metrics based on graded relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
On effectiveness measures and relevance functions in ranking INEX systems
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Pattern Recognition Letters
Measures for benchmarking semantic web service matchmaking correctness
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Information quality measurement of medical encoding support based on usability
Computer Methods and Programs in Biomedicine
Hi-index | 0.00 |
At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multigrade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.