The nature of statistical learning theory
The nature of statistical learning theory
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Theory and Practice of Discourse Parsing and Summarization
The Theory and Practice of Discourse Parsing and Summarization
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Journal of Intelligent Information Systems
Support Vector Machines Based on a Semantic Kernel for Text Categorization
IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An unsupervised approach to recognizing discourse relations
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sentence level discourse parsing using syntactic and lexical information
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory
SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
NLTK: the Natural Language Toolkit
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
T2D: Generating Dialogues Between Virtual Agents Automatically from Text
IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
A novel discourse parser based on support vector machine classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic sense prediction for implicit discourse relations in text
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Recognizing implicit discourse relations in the Penn Discourse Treebank
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A sequential model for discourse segmentation
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Semi-supervised discourse relation classification with structural learning
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Text-level discourse parsing with rich linguistic features
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of co-occurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Tree-bank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that the proposed method is a first step towards detecting low-occurrence relations, which is useful for domains with a lack of annotated data.