CFILT: Resource conscious approaches for all-words domain specific WSD

Authors:
Anup Kulkarni;Mitesh M. Khapra;Saurabh Sohoney;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India
Venue:
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Year:
2010

Citing 4
Cited 2

Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
Domain-specific sense distributions and predominant sense acquisition

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Unsupervised acquisition of predominant word senses

Computational Linguistics

Joining forces pays off: multilingual joint word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe two approaches for All-words Word Sense Disambiguation on a Specific Domain. The first approach is a knowledge based approach which extracts domain-specific largest connected components from the Wordnet graph by exploiting the semantic relations between all candidate synsets appearing in a domain-specific untagged corpus. Given a test word, disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components. The second approach is a weakly supervised approach which relies on the "One Sense Per Domain" heuristic and uses a few hand labeled examples for the most frequently appearing words in the target domain. Once the most frequent words have been disambiguated they can provide strong clues for disambiguating other words in the sentence using an iterative disambiguation algorithm. Our weakly supervised system gave the best performance across all systems that participated in the task even when it used as few as 100 hand labeled examples from the target domain.