Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
HLT '93 Proceedings of the workshop on Human Language Technology
Building a sense tagged corpus with open mind word expert
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
OntoNotes: A Unified Relational Semantic Representation
ICSC '07 Proceedings of the International Conference on Semantic Computing
Applying alternating structure optimization to word sense disambiguation
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PNNL: a supervised maximum entropy approach to word sense disambiguation
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UBC-ALM: combining k-NN with SVD for WSD
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
English lexical sample task description
SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Joint parsing and named entity recognition
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotation and verification of sense pools in OntoNotes
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.