Building a parallel bilingual syntactically annotated corpus

  • Authors:
  • Jan Cuřín;Martin Čmejrek;Jiří Havelka;Vladislav Kuboň

  • Affiliations:
  • Center for Computational Linguistics, Charles University in Prague;Center for Computational Linguistics, Charles University in Prague;Institute of Formal and Applied Linguistics, Charles University in Prague;Institute of Formal and Applied Linguistics, Charles University in Prague

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a process of building a bilingual syntactically annotated corpus, the PCEDT (Prague Czech-English Dependency Treebank). The corpus is being created at Charles University, Prague, and the release of this corpus as Linguistic Data Consortium data collection is scheduled for the spring of 2004. The paper discusses important decisions made prior to the start of the project and gives an overview of all kinds of resources included in the PCEDT.