A formal framework for linguistic annotation
Speech Communication - Special issue on speech annotation and corpus tools
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Enabling technology for multilingual natural language generation: the KPML development environment
Natural Language Engineering
Hi-index | 0.00 |
As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we explore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often focus on areas of linguistic variability that are under-represented in analysis-directed approaches. Therefore, making use of generation resources promises some significant extensions in the kinds of annotation information that can be captured. We focus here on exploring the use of the kpml (Komet-Penman MultiLingual) generation system for corpus annotation. We describe the kinds of linguistic information covered in kpml and show the steps involved in creating a standard xml corpus representation from kpml's generation output.