Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

Authors:
Fuzhen Zhuang;Ping Luo;Zhiyong Shen;Qing He;Yuhong Xiong;Zhongzhi Shi;Hui Xiong
Affiliations:
Chinese Academy of Sciences, Beijing, China;Hewlett Packard Labs China, Beijing, China;Hewlett Packard Labs China, Beijing, China;Chinese Academy of Sciences, Beijing, China;Innovation Works, Beijing, China;Chinese Academy of Sciences, Beijing, China;Rutgers University, New Brunswick, NJ, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 16
Cited 6

A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A two-stage approach to domain adaptation for statistical classifiers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Bridged Refinement for Transfer Learning

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Transfer learning from multiple source domains via consensus regularization

Proceedings of the 17th ACM conference on Information and knowledge management
Latent space domain transfer between high dimensional overlapping distributions

Proceedings of the 18th international conference on World wide web
EigenTransfer: a unified framework for transfer learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Heterogeneous source consensus learning via decision propagation and negotiation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic matrix tri-factorization

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Multi-domain learning by confidence-weighted parameter combination

Machine Learning
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Compact coding for hyperplane classifiers in heterogeneous environment

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Multi-view learning via probabilistic latent semantic analysis

Information Sciences: an International Journal
Regularized nonnegative shared subspace learning

Data Mining and Knowledge Discovery
Triplex transfer learning: exploiting both shared and distinct concepts for text classification

Proceedings of the sixth ACM international conference on Web search and data mining
A partially supervised cross-collection topic model for cross-domain text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Concept learning for cross-domain text classification: a general probabilistic framework

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The distribution difference among multiple data domains has been considered for the cross-domain text classification problem. In this study, we show two new observations along this line. First, the data distribution difference may come from the fact that different domains use different key words to express the same concept. Second, the association between this conceptual feature and the document class may be stable across domains. These two issues are actually the distinction and commonality across data domains. Inspired by the above observations, we propose a generative statistical model, named Collaborative Dual-PLSA (CD-PLSA), to simultaneously capture both the domain distinction and commonality among multiple domains. Different from Probabilistic Latent Semantic Analysis (PLSA) with only one latent variable, the proposed model has two latent factors y and z, corresponding to word concept and document class respectively. The shared commonality intertwines with the distinctions over multiple domains, and is also used as the bridge for knowledge transformation. We exploit an Expectation Maximization (EM) algorithm to learn this model, and also propose its distributed version to handle the situation where the data domains are geographically separated from each other. Finally, we conduct extensive experiments over hundreds of classification tasks with multiple source domains and multiple target domains to validate the superiority of the proposed CD-PLSA model over existing state-of-the-art methods of supervised and transfer learning. In particular, we show that CD-PLSA is more tolerant of distribution differences.