Domain adaptation for text categorization by feature labeling

  • Authors:
  • Cristina Kadar;José Iria

  • Affiliations:
  • IBM Research Zurich, Rüschlikon, Switzerland;IBM Research Zurich, Rüschlikon, Switzerland

  • Venue:
  • ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.