High-accuracy annotation and parsing of CHILDES transcripts

Authors:
Kenji Sagae;Eric Davis;Alon Lavie;Brian MacWhinney;Shuly Wintner
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo, Japan;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;University of Haifa, Haifa, Israel
Venue:
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Year:
2007

Citing 6
Cited 8

A maximum entropy approach to natural language processing

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic measurement of syntactic development in child language

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A best-first probabilistic shift-reduce parser

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Labeled pseudo-projective dependency parsing with support vector machines

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning

Formal Grammars of Early Language

Languages: From Formal to Natural
Wide-coverage parsing of speech transcripts

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Measuring language development in early childhood education: a case study of grammar checking in child language transcripts

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Computational models of language acquisition

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
The PASCAL Challenge on Grammar Induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Combining the sparsity and unambiguity biases for grammar induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Automatically learning measures of child language development

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
The Hebrew CHILDES corpus: transcription and morphological analysis

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To date, we have produced a corpus of over 65,000 words with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for English CHILDES data. The parser and the manually annotated data are freely available for research purposes.