OntoNotes: corpus cleanup of mistaken agreement using word sense disambiguation

Authors:
Liang-Chih Yu;Chung-Hsien Wu;Eduard Hovy
Affiliations:
National Cheng Kung University, Tainan, Taiwan, R.O.C.;National Cheng Kung University, Tainan, Taiwan, R.O.C.;University of Southern California, Marina del Rey, CA
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 11
Cited 2

Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
Building a sense tagged corpus with open mind word expert

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
OntoNotes: A Unified Relational Semantic Representation

ICSC '07 Proceedings of the International Conference on Semantic Computing
Applying alternating structure optimization to word sense disambiguation

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PNNL: a supervised maximum entropy approach to word sense disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UBC-ALM: combining k-NN with SVD for WSD

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
English lexical sample task description

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Joint parsing and named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotation and verification of sense pools in OntoNotes

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.