Semi-supervised discourse relation classification with structural learning

Authors:
Hugo Hernault;Danushka Bollegala;Mitsuru Ishizuka
Affiliations:
Graduate School of Information Science & Technology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan;Graduate School of Information Science & Technology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan;Graduate School of Information Science & Technology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Year:
2011

Citing 21
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An unsupervised approach to recognizing discourse relations

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sentence level discourse parsing using syntactic and lexical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Representing Discourse Coherence: A Corpus-Based Study

Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
T2D: Generating Dialogues Between Virtual Agents Automatically from Text

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
From rhetorical structures to document structure: shallow pragmatic analysis for document engineering

Proceedings of the 9th ACM symposium on Document engineering
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A novel discourse parser based on support vector machine classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic sense prediction for implicit discourse relations in text

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Analysis of discourse structure with syntactic dependencies and data-driven shift-reduce parsing

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Recognizing implicit discourse relations in the Penn Discourse Treebank

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using entity features to classify implicit discourse relations

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Discourse indicators for content selection in summarization

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Predicting discourse connectives for implicit discourse relation recognition

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

A weakly-supervised approach to argumentative zoning of scientific documents

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a timeconsuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.