Concept learning for cross-domain text classification: a general probabilistic framework

  • Authors:
  • Fuzhen Zhuang;Ping Luo;Peifeng Yin;Qing He;Zhongzhi Shi

  • Affiliations:
  • Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;Hewlett Packard Labs, China;Pennsylvania State University;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

  • Venue:
  • IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-domain learning targets at leveraging the knowledge from source domains to train accurate models for the test data from target domains with different but related data distributions. To tackle the challenge of data distribution difference in terms of raw features, previous works proposed to mine high-level concepts (e.g., word clusters) across data domains, which shows to be more appropriate for classification. However, all these works assume that the same set of concepts are shared in the source and target domains in spite that some distinct concepts may exist only in one of the data domains. Thus, we need a general framework, which can incorporate both shared and distinct concepts, for cross-domain classification. To this end, we develop a probabilistic model, by which both the shared and distinct concepts can be learned by the EM process which optimizes the data likelihood. To validate the effectiveness of this model we intentionally construct the classification tasks where the distinct concepts exist in the data domains. The systematic experiments demonstrate the superiority of our model over all compared baselines, especially on those much more challenging tasks.