Panel session: discourse annotation

Authors:
Manfred Stede;Janyce Wiebe;Eva Hajičová;Brian Reese;Simone Teufel;Bonnie Webber;Theresa Wilson
Affiliations:
University of Potsdam;University of Pittsburgh;Physics Charles University;Univ. of Texas at Austin;Univ. of Cambridge;Univ. of Edinburgh;Univ. of Pittsburgh
Venue:
LAW '07 Proceedings of the Linguistic Annotation Workshop
Year:
2007

Citing 10
Cited 0

Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for discourse-level argumentation in research articles

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Anaphora and Discourse Structure

Computational Linguistics
Representing Discourse Coherence: A Corpus-Based Study

Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Attribution and the (non-)alignment of syntactic and discourse arguments of connectives

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Annotation and data mining of the Penn Discourse TreeBank

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
The Potsdam commentary corpus

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
An annotation scheme for citation function

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The classical "success story" of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reaches beyond the sentence boundaries. Phenomena that have been annotated include coreference links, the scope of connectives, and coherence relations. Many of these are phenomena on whose handling there is not a general agreement in the research community, and therefore the question of "recycling" corpora by other people and for other purposes is often difficult. (To some extent, this is due to the fact that discourse annotation deals "only" with surface reflections of underlying, abstract objects.) At the same time, the efforts needed for building high-quality discourse corpora are considerable, and thus one should be careful in deciding how to invest those efforts. One aspect of providing added-value with annotation projects is that of shared corpora: If a variety of annotation efforts is executed on the same primary data, the series of annotation levels can yield insights that the creators of the individual levels had not explicitly planned for. A clear case is the relationship between coherence relations and connective use: When both levels are marked individually and with independent annotation guidelines, then afterwards the correlations between coherence relations, cue usage (and possibly other factors, if annotated) can be studied systematically. This conception of multi-level annotation presupposes, of course, that the technical problems of setting annotation levels in correspondence to one another be resolved.