C4.5: programs for machine learning
C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
WWW '03 Proceedings of the 12th international conference on World Wide Web
Tree Induction for Probability-Based Ranking
Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Development and use of a gold-standard data set for subjectivity classifications
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning subjective nouns using extraction pattern bootstrapping
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Learning extraction patterns for subjective expressions
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
OpinionFinder: a system for subjectivity analysis
HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Exploiting subjectivity classification to improve information extraction
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Automatically generating extraction patterns from untagged text
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Creating subjective and objective sentence classifiers from unannotated texts
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
The unsymmetrical-style co-training
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Instance selection in semi-supervised learning
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Recent natural language processing (NLP) research shows that identifying and extracting subjective information fromtexts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selectionmetrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.