A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

Authors:
Cyril Goutte;Eric Gaussier
Affiliations:
Xerox Research Centre Europe, Meylan, France;Xerox Research Centre Europe, Meylan, France
Venue:
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Year:
2005

Citing 7
Cited 19

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical inference in retrieval effectiveness evaluation

Information Processing and Management: an International Journal
Making large-scale support vector machine learning practical

Advances in kernel methods
Information Retrieval

Information Retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Significance tests for the evaluation of ranking methods

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Advances in Ontology Matching

Advances in Web Semantics I
A systematic analysis of performance measures for classification tasks

Information Processing and Management: an International Journal
ILP-based concept discovery in multi-relational data mining

Expert Systems with Applications: An International Journal
Linking Life Sciences Data Using Graph-Based Mapping

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

The Journal of Machine Learning Research
Uncovering age-specific invasive and DCIS breast cancer rules using inductive logic programming

Proceedings of the 1st ACM International Health Informatics Symposium
Content-based binary image retrieval using the adaptive hierarchical density histogram

Pattern Recognition
System implementation and adaptation evaluation in adaptive web-based systems

Proceedings of the 12th International Conference on Computer Systems and Technologies
Optimizing potential information transfer with self-referential memory

UC'06 Proceedings of the 5th international conference on Unconventional Computation
Improving the scalability of ILP-based multi-relational concept discovery system through parallelization

Knowledge-Based Systems
Word spotting in historical printed documents using shape and sequence comparisons

Pattern Recognition
Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports

Journal of Biomedical Informatics
Location-based reasoning about complex multi-agent behavior

Journal of Artificial Intelligence Research
Multi-instance multi-label image classification: A neural approach

Neurocomputing
Relational differential prediction

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Computing precision and recall with missing or uncertain ground truth

GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges
Rhetorics-based multi-document summarization

Expert Systems with Applications: An International Journal
Sequential testing in classifier evaluation yields biased estimates of effectiveness

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Predicting Aging-Genes in Drosophila Melanogaster by Integrating Network Topological Features and Functional Categories

International Journal of Knowledge Discovery in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.