A sequential model for discourse segmentation

Authors:
Hugo Hernault;Danushka Bollegala;Mitsuru Ishizuka
Affiliations:
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 13
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Sentence level discourse parsing using syntactic and lexical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generating Dialogues for Virtual Agents Using Nested Textual Coherence Relations

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
From rhetorical structures to document structure: shallow pragmatic analysis for document engineering

Proceedings of the 9th ACM symposium on Document engineering
A syntactic and lexical-based discourse segmenter

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A novel discourse parser based on support vector machine classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Towards semi-supervised classification of discourse relations using feature correlations

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A reranking model for discourse segmentation using subtree features

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Exploiting discourse information to identify paraphrases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.