Multiple-step treebank conversion: from dependency to Penn format

  • Authors:
  • Cristina Bosco

  • Affiliations:
  • Università di Torino, Torino - Italia

  • Venue:
  • LAW '07 Proceedings of the Linguistic Annotation Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Whilst the degree to which a treebank subscribes to a specific linguistic theory limits the usefulness of the resource, the availability of more formats for the same resource plays a crucial role both in NLP and linguistics. Conversion tools and multi-format treebanks are useful for investigating portability of NLP systems and validity of annotation. Unfortunately, conversion is a quite complex task since it involves grammatical rules and linguistic knowledge to be incorporated into the converter program. The paper focusses on a methodology for treebank conversion which consists in splitting the process in steps corresponding to the kinds of information that have to be converted, i.e. morphological, structural or relational syntactic. The advantage is the generation of a set of parallel treebanks featuring progressively differentiated formats. An application to the case of an Italian dependency-based treebank in a Penn like format is described.