Examining the robustness of evaluation metrics for patent retrieval with incomplete relevance judgements

Authors:
Walid Magdy;Gareth J. F. Jones
Affiliations:
Centre for Next Generation Localization, School of Computing, Dublin City University, Dublin 9, Ireland;Centre for Next Generation Localization, School of Computing, Dublin City University, Dublin 9, Ireland
Venue:
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Year:
2010

Citing 11
Cited 1

Overview of the sixth text REtrieval conference (TREC-6)

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Bias and the limits of pooling

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On the robustness of relevance measures with incomplete judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
CLEF-IP 2009: retrieval experiments in the intellectual property domain

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

Building queries for prior-art search

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent years have seen a growing interest in research into patent retrieval. One of the key issues in conducting information retrieval (IR) research is meaningful evaluation of the effectiveness of the retrieval techniques applied to task under investigation. Unlike many existing well explored IR tasks where the focus is on achieving high retrieval precision, patent retrieval is to a significant degree a recall focused task. The standard evaluation metric used for patent retrieval evaluation tasks is currently mean average precision (MAP). However this does not reflect system recall well. Meanwhile, the alternative of using the standard recall measure does not reflect user search effort, which is a significant factor in practical patent search environments. In recent work we introduce a novel evaluation metric for patent retrieval evaluation (PRES) [ 13]. This is designed to reflect both system recall and user effort. Analysis of PRES demonstrated its greater effectiveness in evaluating recall-oriented applications than standard MAP and Recall. One dimension of the evaluation of patent retrieval which has not previously been studied is the effect on reliability of the evaluation metrics when relevance judgements are incomplete. We provide a study comparing the behaviour of PRES against the standard MAP and Recall metrics for varying incomplete judgements in patent retrieval. Experiments carried out using runs from the CLEF-IP 2009 datasets show that PRES and Recall are more robust than MAP for incomplete relevance sets for this task with a small preference to PRES as the most robust evaluation metric for patent retrieval with respect to the completeness of the relevance set.