Elements of information theory
Elements of information theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text chunking based on a generalization of winnow
The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material
The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Partial parsing via finite-state cascades
Natural Language Engineering
Chunking with support vector machines
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Filtering-Ranking Perceptron Learning for Partial Parsing
Machine Learning
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Use of support vector learning for chunk identification
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Shallow parsing as part-of-speech tagging
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Introduction to the CoNLL-2005 shared task: semantic role labeling
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The AMI meeting corpus: a pre-announcement
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Hi-index | 0.00 |
Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the conll shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.