Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for discourse-level argumentation in research articles
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Anaphora and Discourse Structure
Computational Linguistics
Representing Discourse Coherence: A Corpus-Based Study
Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory
SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Attribution and the (non-)alignment of syntactic and discourse arguments of connectives
CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Annotation and data mining of the Penn Discourse TreeBank
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
An annotation scheme for citation function
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Hi-index | 0.00 |
The classical "success story" of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reaches beyond the sentence boundaries. Phenomena that have been annotated include coreference links, the scope of connectives, and coherence relations. Many of these are phenomena on whose handling there is not a general agreement in the research community, and therefore the question of "recycling" corpora by other people and for other purposes is often difficult. (To some extent, this is due to the fact that discourse annotation deals "only" with surface reflections of underlying, abstract objects.) At the same time, the efforts needed for building high-quality discourse corpora are considerable, and thus one should be careful in deciding how to invest those efforts. One aspect of providing added-value with annotation projects is that of shared corpora: If a variety of annotation efforts is executed on the same primary data, the series of annotation levels can yield insights that the creators of the individual levels had not explicitly planned for. A clear case is the relationship between coherence relations and connective use: When both levels are marked individually and with independent annotation guidelines, then afterwards the correlations between coherence relations, cue usage (and possibly other factors, if annotated) can be studied systematically. This conception of multi-level annotation presupposes, of course, that the technical problems of setting annotation levels in correspondence to one another be resolved.