Retrieval system evaluation using recall and precision: problems and answers
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the evaluation of retrieval performance
Information Processing and Management: an International Journal
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance judgments for assessing recall
Information Processing and Management: an International Journal
Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Readings in information retrieval
Readings in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Modern Information Retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
A Graphical User Interface for Structured Document Retrieval
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Using graded relevance assessments in IR evaluation
Journal of the American Society for Information Science and Technology
The concept of relevance in IR
Journal of the American Society for Information Science and Technology
The overlap problem in content-oriented XML retrieval evaluation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Evaluating the effectiveness of content-oriented XML retrieval methods
Information Retrieval
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
The interactive track at INEX 2005
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
The reliability of metrics based on graded relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Reliability tests for the XCG and inex-2002 metrics
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Evaluating XML retrieval effectiveness at INEX
ACM SIGIR Forum
Keyword proximity search in complex data graphs
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A new interpretation of average precision
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Structural relevance: a common basis for the evaluation of structured document retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Flexible document-query matching based on a probabilistic content and structure score combination
Proceedings of the 2010 ACM Symposium on Applied Computing
INEX 2002-2006: understanding XML retrieval evaluation
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Expected reading effort in focused retrieval evaluation
Information Retrieval
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
Processing keyword search on XML: a survey
World Wide Web
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Contextualization using hyperlinks and internal hierarchical structure of Wikipedia documents
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We propose and evaluate a family of measures, the eXtended Cumulated Gain (XCG) measures, for the evaluation of content-oriented XML retrieval approaches. Our aim is to provide an evaluation framework that allows the consideration of dependency among XML document components. In particular, two aspects of dependency are considered: (1) near-misses, which are document components that are structurally related to relevant components, such as a neighboring paragraph or container section, and (2) overlap, which regards the situation wherein the same text fragment is referenced multiple times, for example, when a paragraph and its container section are both retrieved. A further consideration is that the measures should be flexible enough so that different models of user behavior may be instantiated within. Both system- and user-oriented aspects are investigated and both recall and precision-like qualities are measured. We evaluate the reliability of the proposed measures based on the INEX 2004 test collection. For example, the effects of assessment variation and topic set size on evaluation stability are investigated, and the upper and lower bounds of expected error rates are established. The evaluation demonstrates that the XCG measures are stable and reliable, and in particular, that the novel measures of effort-precision and gain-recall (ep/gr) show comparable behavior to established IR measures like precision and recall.