Annotating coordination in the Penn treebank

  • Authors:
  • Wolfgang Maier;Erhard Hinrichs;Sandra Kübler;Julia Krivanek

  • Affiliations:
  • Universität Düsseldorf Institut für Sprache und Information;Universität Tubingen Seminar für Sprachwissenschaft;Indiana University;Universität Tubingen Seminar für Sprachwissenschaft

  • Venue:
  • LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding coordinations provides useful information for many NLP endeavors. However, the task has not received much attention in the literature. A major reason for that is that the annotation of major treebanks does not reliably annotate coordination. This makes it virtually impossible to detect coordinations in which two conjuncts are separated by punctuation rather than by a coordinating conjunction. In this paper, we present an annotation scheme for the Penn Treebank which introduces a distinction between coordinating from non-coordinating punctuation. We discuss the general annotation guidelines as well as problematic cases. Eventually, we show that this additional annotation allows the retrieval of a considerable number of coordinate structures beyond the ones having a coordinating conjunction.