Exploring representation-learning approaches to domain adaptation

  • Authors:
  • Fei Huang;Alexander Yates

  • Affiliations:
  • Temple University, Philadelphia, PA;Temple University, Philadelphia, PA

  • Venue:
  • DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Sequence labeling systems like part-of-speech taggers are typically trained on newswire text, and in tests their error rate on, for example, biomedical data can triple, or worse. We investigate techniques for building open-domain sequence labeling systems that approach the ideal of a system whose accuracy is high and constant across domains. In particular, we investigate unsupervised techniques for representation learning that provide new features which are stable across domains, in that they are predictive in both the training and out-of-domain test data. In experiments, our novel techniques reduce error by as much as 29% relative to the previous state of the art on out-of-domain text.