Semi-supervised self-training for sentence subjectivity classification

Authors:
Bin Wang;Bruce Spencer;Charles X. Ling;Harry Zhang
Affiliations:
Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;National Research Council of Canada, Fredericton, NB, Canada;Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Year:
2008

Citing 15
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Tree Induction for Probability-Based Ranking

Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Development and use of a gold-standard data set for subjectivity classifications

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Learning extraction patterns for subjective expressions

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
OpinionFinder: a system for subjectivity analysis

HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Exploiting subjectivity classification to improve information extraction

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Creating subjective and objective sentence classifiers from unannotated texts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

The unsymmetrical-style co-training

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Instance selection in semi-supervised learning

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Cost-Sensitive self-training

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent natural language processing (NLP) research shows that identifying and extracting subjective information fromtexts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selectionmetrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.