Sequence models and ranking methods for discourse parsing

Authors:
James Pustejovsky;Ben Wellner
Affiliations:
Brandeis University;Brandeis University
Venue:
Sequence models and ranking methods for discourse parsing
Year:
2009

Citing 0
Cited 4

Genre distinctions for discourse in the Penn TreeBank

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Using sense-labeled discourse connectives for statistical machine translation

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Discourse structure and computation: past, present and future

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Discourse structure and language technology

Natural Language Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many important aspects of natural language reside beyond the level of a single sentence or clause, at the level of the discourse, including: reference relations such anaphora, notions of topic/focus and foreground/background information as well as rhetorical relations such as CAUSATION or MOTIVATION. This dissertation is concerned with data-driven, machine learning-based methods for the latter - the identification of rhetorical discourse relations between abstract objects, including events, states and propositions. Our focus is specifically on those relations based on the semantic content of their arguments as opposed to the intent of the writer. We formulate a dependency view of discourse in which the arguments of a rhetorical relation are lexical heads, rather than arbitrary segments of text. This avoids the difficult problem of identifying the most elementary segments of the discourse. The resulting discourse parsing problem involves the following steps: (1) identification of discourse cue phrases that signal a rhetorical relation (2) identification of the two arguments of a rhetorical relation signaled by a discourse cue phrase and (3) determination of the type of the rhetorical relation. To address the above problems, we apply a set of discriminative, statistical machine learning algorithms and explore the tradeoffs with various sets of features. We demonstrate how performance can be improved through learning architectures that allow for multiple co-dependent processing stages to be handled within a single model, rather than as a cascade of separate models. We capture additional dependencies with the novel application sequence-structured Conditional Random Fields to the problem of identifying discourse relations and their rhetorical types. The proposed Conditional Random Field model is more general than typically utilized in the literature, making use of non-factored feature functions to arrive at a conditional, sequential ranking model. Finally, we demonstrate the general applicability of our proposed discourse parsing model by applying it to the problem of syntactic dependency parsing, itself an important determinant for discourse parsing. This points towards a layered sequential (re-)ranking architecture for complex language processing tasks applicable beyond discourse parsing.