Topic-bridged PLSA for cross-domain text classification

Authors:
Gui-Rong Xue;Wenyuan Dai;Qiang Yang;Yong Yu
Affiliations:
Shanghai Jiao-Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 14
Cited 33

Representation and learning in information retrieval

Representation and learning in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Logistic regression with an auxiliary data source

ICML '05 Proceedings of the 22nd international conference on Machine learning
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring in the weblog space by detecting informative and affective articles

Proceedings of the 16th international conference on World Wide Web
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning

A hybrid approach to item recommendation in folksonomies

Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Latent space domain transfer between high dimensional overlapping distributions

Proceedings of the 18th international conference on World wide web
Exploring social tagging graph for web object classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic context transfer across heterogeneous sources for domain adaptive video search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Cross-domain sentiment classification using a two-stage method

Proceedings of the 18th ACM conference on Information and knowledge management
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Supervised Dual-PLSA for Personalized SMS Filtering

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Knowledge Discovery from Academic Search Engine

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions

Artificial Intelligence Review
Language models learning for domain-specific natural language user interaction

ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Knowledge transfer based on feature representation mapping for text classification

Expert Systems with Applications: An International Journal
Domain adaptation for text categorization by feature labeling

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Domain adaptation by constraining inter-domain variability of latent feature representation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Knowledge transfer across multilingual corpora via latent topics

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
A musical mood trajectory estimation method using lyrics and acoustic features

MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
Bilingual co-training for sentiment classification of chinese product reviews

Computational Linguistics
Cross-language information retrieval with latent topic models trained on a comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Research on text categorization based on a weakly-supervised transfer learning method

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
TALMUD: transfer learning for multiple domains

Proceedings of the 21st ACM international conference on Information and knowledge management
A fuzzy conceptualization model for text mining with application in opinion polarity classification

Knowledge-Based Systems
Triplex transfer learning: exploiting both shared and distinct concepts for text classification

Proceedings of the sixth ACM international conference on Web search and data mining
A Comparative Study of Cross-Lingual Sentiment Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Chinese terminology extraction using EM-Based transfer learning method

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval
A partially supervised cross-collection topic model for cross-domain text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An unsupervised transfer learning approach to discover topics for online reputation management

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Domain adaptation with topical correspondence learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Concept learning for cross-domain text classification: a general probabilistic framework

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Leveraging multi-domain prior knowledge in topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many Web applications, such as blog classification and new-sgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional text classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain text classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the text classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art text classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain text classification significantly.