A partially supervised cross-collection topic model for cross-domain text classification

Authors:
Yang Bao;Nigel Collier;Anindya Datta
Affiliations:
National University of Singapore, Singapore, Singapore;National Institute of Informatics, Tokyo, Japan;National University of Singapore, Singapore, Singapore
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 15
Cited 0

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Accounting for burstiness in topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Probabilistic matrix tri-factorization

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Cross-cultural analysis of blogs and forums with mixed-collection topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Cross-domain sentiment classification via spectral feature alignment

Proceedings of the 19th international conference on World wide web
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Domain adaptation by constraining inter-domain variability of latent feature representation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A time-dependent topic model for multiple text streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.