Towards Discourse Meaning

  • Authors:
  • Aravind K. Joshi

  • Affiliations:
  • Department of Computer and Information Science and Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, USA

  • Venue:
  • Innovations for Requirement Analysis. From Stakeholders' Needs to Formal Designs
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The overall goal is to discuss some issues concerning the dependencies at the discourse level and at the sentence level. However, first I will briefly describe the Penn Discourse Treebank (PDTB)*, a corpus in which we annotate the discourse connectives (explicit and implicit) and their arguments together with "attributions" of the arguments and the relations denoted by the connectives, and also the senses of the connectives. I will then focus on the complexity of dependencies in terms of (a) the elements that bear the dependency relations, (b) graph theoretic properties of these dependencies such as nested and crossed dependencies, dependencies with shared arguments, and (c) attributions and their relationship to the dependencies, among others. I will compare these dependencies with those at the sentence level and discuss some issues that relate to the transition from the sentence level to the level of "immediate discourse" and propose some conjectures. An increasing interest in moving human language technology beyond the level of the sentence in text summarization, question answering, and natural language generation , among others, has recently led to the development of several resources that are richly annotated at the discourse level. Among these is the Penn Discourse TreeBank. (PDTB), a large-scale resource of annotated discourse relations and their arguments over the one million word Wall Street Journal (WSJ) Corpus. Since the sentence-level syntactic annotations of the Penn Treebank [2] and the predicate-argument annotations of the Propbank [4] have been done over the same target corpus, the PDTB thus provides a richer substrate for the development and evaluation of practical algorithms while supporting the extraction of useful features pertaining to syntax, semantics and discourse all at once. The PDTB is the first to follow a lexically - grounded approach to the annotation of discourse relations. Discourse relations, when realized explicitly in the text, are annotated by marking the necessary lexical items --- called discourse connectives - expressing them, thus supporting their automatic identification. PDTB adopts a theory-neutral approach to the annotation, making no commitments to what kinds of high-level structures may be created from the low level annotations of relations and their arguments. This approach has the appeal of allowing the corpus to be useful for researchers working within different frameworks. This theory neutrality also permits investigation of the general question of how structure at the sentence level relates to structure at the discourse level, at least that part of the discourse structure that is "parallel" to the sentence structure [6]. In addition to the argument structure of discourse relations, the PDTB provides sense labels for each relation following a hierarchical classification scheme. Annotation of senses highlights the polysemy of connectives, making the PDTB useful for sense disambiguation tasks [3]. Finally, the PDTB separately annotates the attribution of each discourse relation and of each of its two arguments. While attribution is a relation between agents and abstract objects and thus not a discourse relation, it has been annotated in the PDTB because (a) it is useful for applications such as subjectivity analysis and multi-perspective QA [5], and (b) it exhibits an interesting and complex interaction between sentence-level structure and discourse structure [1]. The first preliminary release of the PDTB was in April 2006. A significantly extended version was released as PDTB-2.0 in February 2008, through the Linguistic Data Consortium (LDC), see http://www.seas.upenn.edu/ pdtb, for the annotation manual, published papers, tutorial slides and a link to LDC.