All words domain adapted WSD: finding a middle ground between supervision and unsupervision

Authors:
Mitesh M. Khapra;Anup Kulkarni;Saurabh Sohoney;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology Bombay, Mumbai, India;Indian Institute of Technology Bombay, Mumbai, India;Indian Institute of Technology Bombay, Mumbai, India;Indian Institute of Technology Bombay, Mumbai, India
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 12
Cited 3

An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Domain-specific sense distributions and predominant sense acquisition

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Unsupervised acquisition of predominant word senses

Computational Linguistics
Supervised domain adaption for WSD

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
SemEval-2010 task 17: all-words word sense disambiguation on a specific domain

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Knowledge-based WSD on specific domains: performing better than generic supervised WSD

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence

Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
A quick tour of word sense disambiguation, induction and related approaches

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In spite of decades of research on word sense disambiguation (WSD), all-words general purpose WSD has remained a distant goal. Many supervised WSD systems have been built, but the effort of creating the training corpus - annotated sense marked corpora - has always been a matter of concern. Therefore, attempts have been made to develop unsupervised and knowledge based techniques for WSD which do not need sense marked corpora. However such approaches have not proved effective, since they typically do not better Wordnet first sense baseline accuracy. Our research reported here proposes to stick to the supervised approach, but with far less demand on annotation. We show that if we have ANY sense marked corpora, be it from mixed domain or a specific domain, a small amount of annotation in ANY other domain can deliver the goods almost as if exhaustive sense marking were available in that domain. We have tested our approach across Tourism and Health domain corpora, using also the well known mixed domain SemCor corpus. Accuracy figures close to self domain training lend credence to the viability of our approach. Our contribution thus lies in finding a convenient middle ground between pure supervised and pure unsupervised WSD. Finally, our approach is not restricted to any specific set of target words, a departure from a commonly observed practice in domain specific WSD.