Syntactic chunking across different corpora

Authors:
Weiqun Xu;Jean Carletta;Johanna Moore
Affiliations:
HCRC and ICCS, School of Informatics, University of Edinburgh;HCRC and ICCS, School of Informatics, University of Edinburgh;HCRC and ICCS, School of Informatics, University of Edinburgh
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 16
Cited 0

Elements of information theory

Elements of information theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Partial parsing via finite-state cascades

Natural Language Engineering
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Filtering-Ranking Perceptron Learning for Partial Parsing

Machine Learning
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Shallow parsing as part-of-speech tagging

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Introduction to the CoNLL-2005 shared task: semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the conll shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.