Question classification based on co-training style semi-supervised learning

Authors:
Zhengtao Yu;Lei Su;Lina Li;Quan Zhao;Cunli Mao;Jianyi Guo
Affiliations:
The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China and The Institute of Intelligent Information Processing, Computer Technolo ...;Department of Software, Yunnan University, Kunming 650091, China;The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China;The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China;The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China;The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China and The Institute of Intelligent Information Processing, Computer Technolo ...
Venue:
Pattern Recognition Letters
Year:
2010

Citing 6
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Question classification with support vector machines and error correcting codes

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
SETRED: self-training with editing

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data

Computer Methods and Programs in Biomedicine
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.10

Visualization

Abstract

In statistical question classification, semi-supervised learning that can exploit the abundant unlabeled samples has received substantial attention in recent years. In this paper, a novel question classification approach with the co-training style semi-supervised learning is proposed. In particular, the method extracts high-frequency keywords as classification features, and uses the word semantic similarity to adjust the feature weights. The classifiers are initially trained from labeled data and then the learned models are refined using unlabeled data which can get labeled if the classifiers agree on the labeling. Experiments on the Chinese question answering system in tourism domain were conducted by employing different feature selections, different supervised and semi-supervised algorithms, different feature dimensions and different unlabeled rates. The experimental results show the proposed method can effectively improve the classification accuracy. Specifically, under the 40% unlabeled rate of training set, the average accuracy rates reach 88.9% on coarse types and 78.2% on fine types, respectively, which get an improvement of around 2-4% points.