On robustness and domain adaptation using SVD for word sense disambiguation

Authors:
Eneko Agirre;Oier Lopez de Lacalle
Affiliations:
University of the Basque Country, Donostia, Basque Country;University of the Basque Country, Donostia, Basque Country
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 13
Cited 5

Making large-scale support vector machine learning practical

Advances in kernel methods
Using LSI for text classification in the presence of background text

Proceedings of the tenth international conference on Information and knowledge management
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A decision tree of bigrams is an accurate predictor of word sense

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
One sense per collocation and genre/topic variations

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Domain kernels for word sense disambiguation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Domain-specific sense distributions and predominant sense acquisition

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Applying alternating structure optimization to word sense disambiguation

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UBC-ALM: combining k-NN with SVD for WSD

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research

Supervised domain adaption for WSD

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
SemEval-2010 task 17: all-words word sense disambiguation on a specific domain

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Knowledge-based WSD on specific domains: performing better than generic supervised WSD

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
SemEval-2010 task: Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Comparability of LSI and human judgment in text analysis tasks

MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we explore robustness and domain adaptation issues for Word Sense Disambiguation (WSD) using Singular Value Decomposition (SVD) and unlabeled data. We focus on the semi-supervised domain adaptation scenario, where we train on the source corpus and test on the target corpus, and try to improve results using unlabeled data. Our method yields up to 16.3% error reduction compared to state-of-the-art systems, being the first to report successful semi-supervised domain adaptation. Surprisingly the improvement comes from the use of unlabeled data from the source corpus, and not from the target corpora, meaning that we get robustness rather than domain adaptation. In addition, we study the behavior of our system on the target domain.