Word sense disambiguation using OntoNotes: an empirical study

Authors:
Zhi Zhong;Hwee Tou Ng;Yee Seng Chan
Affiliations:
National University of Singapore, Law Link, Singapore;National University of Singapore, Law Link, Singapore;National University of Singapore, Law Link, Singapore
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 19
Cited 9

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Selective sampling for example-based word sense disambiguation

Computational Linguistics
Using a semantic concordance for sense identification

HLT '94 Proceedings of the workshop on Human Language Technology
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
One sense per collocation and genre/topic variations

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An empirical study of the behavior of active learning for word sense disambiguation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 07: coarse-grained English all-words task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NUS-PT: exploiting parallel texts for word sense disambiguation in the English all-words tasks

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PNNL: a supervised maximum entropy approach to word sense disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Introduction to the CoNLL-2005 shared task: semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
English tasks: all-words and verb lexical sample

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Pattern learning and active feature selection for word sense disambiguation

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Supervised domain adaption for WSD

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
SemEval-2010 task 17: all-words word sense disambiguation on a specific domain

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
A knowledge based approach for capturing rich semantic representations from text for intelligent systems

International Journal of Advanced Intelligence Paradigms
Improving semantic role labeling with word sense

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Jointly modeling WSD and SRL with Markov logic

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Identification of domain-specific senses in a machine-readable dictionary

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Reducing the need for double annotation

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The accuracy of current word sense disambiguation (WSD) systems is affected by the fine-grained sense inventory of WordNet as well as a lack of training examples. Using the WSD examples provided through OntoNotes, we conduct the first large-scale WSD evaluation involving hundreds of word types and tens of thousands of sense-tagged examples, while adopting a coarse-grained sense inventory. We show that though WSD systems trained with a large number of examples can obtain a high level of accuracy, they nevertheless suffer a substantial drop in accuracy when applied to a different domain. To address this issue, we propose combining a domain adaptation technique using feature augmentation with active learning. Our results show that this approach is effective in reducing the annotation effort required to adapt a WSD system to a new domain. Finally, we propose that one can maximize the dual benefits of reducing the annotation effort while ensuring an increase in WSD accuracy, by only performing active learning on the set of most frequently occurring word types.