A partially supervised cross-collection topic model for cross-domain text classification

  • Authors:
  • Yang Bao;Nigel Collier;Anindya Datta

  • Affiliations:
  • National University of Singapore, Singapore, Singapore;National Institute of Informatics, Tokyo, Japan;National University of Singapore, Singapore, Singapore

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.