Multi-domain active learning for text classification

Authors:
Lianghao Li;Xiaoming Jin;Sinno Jialin Pan;Jian-Tao Sun
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Institute for Infocomm Research, Singapore, Singapore;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 21
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active learning: theory and applications

Active learning: theory and applications
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Regularized multi--task learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Actively Transfer Domain Knowledge

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Proactive learning: cost-sensitive active learning with multiple imperfect oracles

Proceedings of the 17th ACM conference on Information and knowledge management
Latent space domain transfer between high dimensional overlapping distributions

Proceedings of the 18th international conference on World wide web
Importance weighted active learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Active learning with multiple views

Journal of Artificial Intelligence Research
Multi-domain learning by confidence-weighted parameter combination

Machine Learning
Cross-domain sentiment classification via spectral feature alignment

Proceedings of the 19th international conference on World wide web
Domain adaptation meets active learning

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Active learning in parallel universes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active learning has been proven to be effective in reducing labeling efforts for supervised learning. However, existing active learning work has mainly focused on training models for a single domain. In practical applications, it is common to simultaneously train classifiers for multiple domains. For example, some merchant web sites (like Amazon.com) may need a set of classifiers to predict the sentiment polarity of product reviews collected from various domains (e.g., electronics, books, shoes). Though different domains have their own unique features, they may share some common latent features. If we apply active learning on each domain separately, some data instances selected from different domains may contain duplicate knowledge due to the common features. Therefore, how to choose the data from multiple domains to label is crucial to further reducing the human labeling efforts in multi-domain learning. In this paper, we propose a novel multi-domain active learning framework to jointly select data instances from all domains with duplicate information considered. In our solution, a shared subspace is first learned to represent common latent features of different domains. By considering the common and the domain-specific features together, the model loss reduction induced by each data instance can be decomposed into a common part and a domain-specific part. In this way, the duplicate information across domains can be encoded into the common part of model loss reduction and taken into account when querying. We compare our method with the state-of-the-art active learning approaches on several text classification tasks: sentiment classification, newsgroup classification and email spam filtering. The experiment results show that our method reduces the human labeling efforts by 33.2%, 42.9% and 68.7% on the three tasks, respectively.