Concept learning for cross-domain text classification: a general probabilistic framework

Authors:
Fuzhen Zhuang;Ping Luo;Peifeng Yin;Qing He;Zhongzhi Shi
Affiliations:
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;Hewlett Packard Labs, China;Pennsylvania State University;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 15
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Transfer learning from multiple source domains via consensus regularization

Proceedings of the 17th ACM conference on Information and knowledge management
EigenTransfer: a unified framework for transfer learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Knowledge transformation for cross-domain sentiment classification

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Probabilistic matrix tri-factorization

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Multi-domain learning by confidence-weighted parameter combination

Machine Learning
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-domain learning targets at leveraging the knowledge from source domains to train accurate models for the test data from target domains with different but related data distributions. To tackle the challenge of data distribution difference in terms of raw features, previous works proposed to mine high-level concepts (e.g., word clusters) across data domains, which shows to be more appropriate for classification. However, all these works assume that the same set of concepts are shared in the source and target domains in spite that some distinct concepts may exist only in one of the data domains. Thus, we need a general framework, which can incorporate both shared and distinct concepts, for cross-domain classification. To this end, we develop a probabilistic model, by which both the shared and distinct concepts can be learned by the EM process which optimizes the data likelihood. To validate the effectiveness of this model we intentionally construct the classification tasks where the distinct concepts exist in the data domains. The systematic experiments demonstrate the superiority of our model over all compared baselines, especially on those much more challenging tasks.