High-accuracy annotation and parsing of CHILDES transcripts

  • Authors:
  • Kenji Sagae;Eric Davis;Alon Lavie;Brian MacWhinney;Shuly Wintner

  • Affiliations:
  • University of Tokyo, Bunkyo-ku, Tokyo, Japan;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;University of Haifa, Haifa, Israel

  • Venue:
  • CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To date, we have produced a corpus of over 65,000 words with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for English CHILDES data. The parser and the manually annotated data are freely available for research purposes.