Using Wikipedia for Co-clustering Based Cross-Domain Text Classification

Authors:
Pu Wang;Carlotta Domeniconi;Jian Hu
Affiliations:
-;-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 7

Co-clustering sentences and terms for multi-document summarization

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
High-order co-clustering text data on semantics-based representation model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Abstracting for Dimensionality Reduction in Text Classification

International Journal of Intelligent Systems
Enhancing short text clustering with small external repositories

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Learning person-specific models for facial expression and action unit recognition

Pattern Recognition Letters
Relational term-suggestion graphs incorporating multipartite concept and expertise networks

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Transfer learning with one-class data

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related domain. One would like to use the related labeled data as auxiliary information to accomplish the classification task in the target domain. Recently, the paradigm of transfer learning has been introduced to enable effective learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, we extend the idea underlying this approach by making the latent semantic relationship between the two domains explicit. This goal is achieved with the use of Wikipedia. As a result, the pathway that allows to propagate labels between the two domains not only captures common words, but also semantic concepts based on the content of documents. We empirically demonstrate the efficacy of our semantic-based approach to cross-domain classification using a variety of real data.