Instance selection and instance weighting for cross-domain sentiment classification via PU learning

Authors:
Rui Xia;Xuelei Hu;Jianfeng Lu;Jian Yang;Chengqing Zong
Affiliations:
Department of Computer Science, Nanjing University of Science and Technology, China;Department of Computer Science, Nanjing University of Science and Technology, China;Department of Computer Science, Nanjing University of Science and Technology, China;Department of Computer Science, Nanjing University of Science and Technology, China;National Laboratory of Pattern Recognition, Institute of Automation, CAS
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 11
Cited 0

Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning to identify unexpected instances in the test set

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Domain adaptation via transfer component analysis

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Discriminative Learning Under Covariate Shift

The Journal of Machine Learning Research
Cross-domain sentiment classification via spectral feature alignment

Proceedings of the 19th international conference on World wide web
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Domain adaptation via pseudo in-domain data selection

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Domain adaptation with ensemble of feature groups

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the explosive growth of the Internet online reviews, we can easily collect a large amount of labeled reviews from different domains. But only some of them are beneficial for training a desired target-domain sentiment classifier. Therefore, it is important for us to identify those samples that are the most relevant to the target domain and use them as training data. To address this problem, a novel approach, based on instance selection and instance weighting via PU learning, is proposed. PU learning is used at first to learn an in-target-domain selector, which assigns an in-target-domain probability to each sample in the training set. For instance selection, the samples with higher in-target-domain probability are used as training data; For instance weighting, the calibrated in-target-domain probabilities are used as sampling weights for training an instance-weighted naive Bayes model, based on the principle of maximum weighted likelihood estimation. The experimental results prove the necessity and effectiveness of the approach, especially when the size of training data is large. It is also proved that the larger the Kullback-Leibler divergence between the training and test data is, the more effective the proposed approach will be.