A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
A flexible model for retrieval of SGML documents
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Collection Size and Retrieval Effectiveness
Information Retrieval
Using graded relevance assessments in IR evaluation
Journal of the American Society for Information Science and Technology
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
A report on the first year of the INitiative for the evaluation of XML retrieval (INEX'02)
Journal of the American Society for Information Science and Technology
The 27th ACM/SIGIR International Symposium on Information Retrieval 2004
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The overlap problem in content-oriented XML retrieval evaluation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The 27th ACM/SIGIR International Symposium on Information Retrieval 2004
On evaluating web search with very few relevant documents
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The TREC robust retrieval track
ACM SIGIR Forum
Variations on language modeling for information retrieval
ACM SIGIR Forum
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking the NTCIR systems based on multigrade relevance
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Reliability tests for the XCG and inex-2002 metrics
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On the reliability of information retrieval metrics based on graded relevance
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Hi-index | 0.00 |
This paper investigates the effect of performance measures and relevance functions in comparing retrieval systems in INEX, an evaluation forum dedicated to XML retrieval. We focus on two interdependent challenges which arise when evaluating XML retrieval systems, namely weak ordering issue of retrieved lists and multivalued relevance scales. Our analysis provides empirical evidence about the reasonableness of popular assumptions in information retrieval (IR) evaluation which state that ties can be ignored and binary relevance is sufficient. We also shed light on the impact of a parameter in Q-measure [18] on the sensitivity of the metric.