Panel session: discourse annotation

  • Authors:
  • Manfred Stede;Janyce Wiebe;Eva Hajičová;Brian Reese;Simone Teufel;Bonnie Webber;Theresa Wilson

  • Affiliations:
  • University of Potsdam;University of Pittsburgh;Physics Charles University;Univ. of Texas at Austin;Univ. of Cambridge;Univ. of Edinburgh;Univ. of Pittsburgh

  • Venue:
  • LAW '07 Proceedings of the Linguistic Annotation Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The classical "success story" of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reaches beyond the sentence boundaries. Phenomena that have been annotated include coreference links, the scope of connectives, and coherence relations. Many of these are phenomena on whose handling there is not a general agreement in the research community, and therefore the question of "recycling" corpora by other people and for other purposes is often difficult. (To some extent, this is due to the fact that discourse annotation deals "only" with surface reflections of underlying, abstract objects.) At the same time, the efforts needed for building high-quality discourse corpora are considerable, and thus one should be careful in deciding how to invest those efforts. One aspect of providing added-value with annotation projects is that of shared corpora: If a variety of annotation efforts is executed on the same primary data, the series of annotation levels can yield insights that the creators of the individual levels had not explicitly planned for. A clear case is the relationship between coherence relations and connective use: When both levels are marked individually and with independent annotation guidelines, then afterwards the correlations between coherence relations, cue usage (and possibly other factors, if annotated) can be studied systematically. This conception of multi-level annotation presupposes, of course, that the technical problems of setting annotation levels in correspondence to one another be resolved.