An empirical study of the behavior of active learning for word sense disambiguation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Aligning features with sense distinction dimensions
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
The choice of features for classification of verbs in biomedical texts
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Verb class discovery from rich syntactic data
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Bringing active learning to life
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Classifying French verbs using French and English lexical resources
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Verbs that can have more than one meaning pose problems for Natural Language Processing (NLP) applications. While homonyms (words with unrelated meanings) are fairly tractable, polysemous verbs with similar related meanings pose the greatest hurdle for automatic Word Sense Disambiguation (WSD). A major problem with WSD for verbs is that even humans disagree about what constitutes a different sense for a polysemous word. This thesis investigates verb lexical semantics and their computational representations, and how these can be used for automatic WSD. Our main contribution is in defining criteria by which humans make sense distinctions for verbs, and in translating these criteria into linguistically-motivated features that we use to build a state-of-the-art automatic WSD system. Our explicit criteria for sense distinctions allow humans to sense-tag data more consistently. Improved human performance on the WSD task enables improved system performance. We begin by examining the definition of verb polysemy implicit in Levin verb classes. We describe our work on VerbNet, a lexical resource in which different senses of a verb are defined by membership in different verb classes; the classes have distinctive syntactic frames and explicit semantic predicates that characterize the verb senses in that class. We then translate some of these lexical semantic characteristics into richer linguistic features used to build our automatic WSD system. The system performs competitively on the English verbs of Senseval-1 and Senseval-2 by combining information from syntax, lexical collocations, and semantic class constraints on verb arguments. Adding gold-standard predicate-argument information from PropBank further improves system performance. Because humans have difficulty making fine-grained sense distinctions, creation of manually sense-tagged corpora is time-consuming and expensive. We experiment with active learning to get additional training data for our system, but find that the quality of manually sense-tagged data is limited by an inconsistent or unclear sense inventory. We develop criteria for grouping senses and show that well-defined groupings of WordNet senses can improve both human inter-annotator agreement and system performance. The groupings fit into a hierarchy of WordNet senses that allow different NLP applications to use different granularities of sense distinctions.