Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

  • Authors:
  • Daniel Tse;James R. Curran

  • Affiliations:
  • University of Sydney;University of Sydney

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (ccg) derivations, induced automatically from the Penn Chinese Treebank (pctb). We design parsimonious ccg analyses for a range of Chinese syntactic constructions, and transform the pctb trees to produce them. Our process yields a corpus of 27,759 derivations, covering 98.1% of the pctb.