A sequential model for discourse segmentation

  • Authors:
  • Hugo Hernault;Danushka Bollegala;Mitsuru Ishizuka

  • Affiliations:
  • Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.