A model-theoretic coreference scoring scheme
MUC6 '95 Proceedings of the 6th conference on Message understanding
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
On coreference resolution performance metrics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
BART: a modular toolkit for coreference resolution
HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Understanding the value of features for coreference resolution
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatically evaluating content selection in summarization without human models
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Hi-index | 0.00 |
Human annotation for Co-reference Resolution (CRR) is labor intensive and costly, and only a handful of annotated corpora are currently available. However, corpora with Named Entity (NE) annotations are widely available. Also, unlike current CRR systems, state-of-the-art NER systems have very high accuracy and can generate NE labels that are very close to the gold standard for unlabeled corpora. We propose a new set of metrics collectively called CONE for Named Entity Co-reference Resolution (NE-CRR) that use a subset of gold standard annotations, with the advantage that this subset can be easily approximated using NE labels when gold standard CRR annotations are absent. We define CONE B3 and CONE CEAF metrics based on the traditional B3 and CEAF metrics and show that CONE B3 and CONE CEAF scores of any CRR system on any dataset are highly correlated with its B3 and CEAF scores respectively. We obtain correlation factors greater than 0.6 for all CRR systems across all datasets, and a best-case correlation factor of 0.8. We also present a baseline method to estimate the gold standard required by CONE metrics, and show that CONE B3 and CONE CEAF scores using this estimated gold standard are also correlated with B3 and CEAF scores respectively. We thus demonstrate the suitability of CONE B3 and CONE CEAF for automatic evaluation of NE-CRR.