Exploiting associations between word clusters and document classes for cross-domain text categorization

Authors:
Fuzhen Zhuang;Ping Luo;Hui Xiong;Qing He;Yuhong Xiong;Zhongzhi Shi
Affiliations:
The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences and Graduate University of Chinese Academy of Sciences, Beijing 100039, Chi ...;Hewlett Packard Labs China;MSIS Department, Rutgers University;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;Innovation Works;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Venue:
Statistical Analysis and Data Mining
Year:
2011

Citing 0
Cited 3

Predicting positive and negative links in signed social networks by transfer learning

Proceedings of the 22nd international conference on World Wide Web
Domain adaptation with topical correspondence learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Towards robust co-clustering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source domain to an unlabeled target domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw-word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the non-negative matrix tri-factorization. Specifically, we formulate a joint optimization framework of the two matrix tri-factorizations for the source- and target-domain data, respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 100–114, 2011 (This is an invited submission from the Best of SDM 2010.)