Using semi-supervised learning for question classification

Authors:
Nguyen Thanh Tri;Nguyen Minh Le;Akira Shimazu
Affiliations:
School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan;School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan;School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 8
Cited 2

Support-Vector Networks

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Question classification with support vector machines and error correcting codes

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering

CoCQA: co-training over questions and answers with an application to predicting question subjectivity orientation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A graph-based semi-supervised learning for question-answering

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper tries to use unlabelled in combination with labelled questions for semi-supervised learning to improve the performance of question classification task. We also give two proposals to modify the Tri-training which is a simple but efficient co-training style algorithm to make it more suitable for question data type. In order to avoid bootstrap-sampling the training set to get different sets for training the three classifiers, the first proposal is to use multiple algorithms for classifiers in Tri-training, the second one is to use multiple algorithms for classifiers in combination with multiple views. The modification prevents the error rate at the initial step from being increased and our experiments show promising results.