Chinese chunking with tri-training learning

Authors:
Wenliang Chen;Yujie Zhang;Hitoshi Isahara
Affiliations:
Computational Linguistics Group, National Institute of Information and Communications Technology, Kyoto, Japan;Computational Linguistics Group, National Institute of Information and Communications Technology, Kyoto, Japan;Computational Linguistics Group, National Institute of Information and Communications Technology, Kyoto, Japan
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 14
Cited 4

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Memory-based shallow parsing

The Journal of Machine Learning Research
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Example selection for bootstrapping statistical parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Statistics based hybrid approach to Chinese base phrase identification

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An empirical study of Chinese chunking

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Word sense disambiguation with semi-supervised learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Applying conditional random fields to chinese shallow parsing

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Simple semi-supervised training of part-of-speech taggers

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Semi-supervised dependency parsing using generalized tri-training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Robust semi-supervised and ensemble-based methods in word sense disambiguation

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
The effect of semi-supervised learning on parsing long distance dependencies in German and Swedish

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.