Ranking the NTCIR systems based on multigrade relevance

Authors:
Tetsuya Sakai
Affiliations:
Toshiba Corporate R&D Center, Kawasaki, Japan
Venue:
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Year:
2004

Citing 3
Cited 9

IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Measuring retrieval effectiveness: a new proposal and a first experimental validation

Journal of the American Society for Information Science and Technology

Flexible pseudo-relevance feedback via selective sampling

ACM Transactions on Asian Language Information Processing (TALIP)
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The reliability of metrics based on graded relevance

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
On effectiveness measures and relevance functions in ranking INEX systems

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Retrieval of logically relevant 3D human motions by Adaptive Feature Selection with Graded Relevance Feedback

Pattern Recognition Letters
Measures for benchmarking semantic web service matchmaking correctness

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Information quality measurement of medical encoding support based on usability

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multigrade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.