Towards scalable speech act recognition in Twitter: tackling insufficient training data

Authors:
Renxian Zhang;Dehong Gao;Wenjie Li
Affiliations:
The Hong Kong Polytechnic University;The Hong Kong Polytechnic University;The Hong Kong Polytechnic University
Venue:
Proceedings of the Workshop on Semantic Analysis in Social Media
Year:
2012

Citing 8
Cited 0

Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Language and the Internet

Language and the Internet
Learning to detect conversation focus of threaded discussions

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Semi-supervised speech act recognition in emails and forums

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Internet Linguistics: A Student Guide

Internet Linguistics: A Student Guide

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recognizing speech act types in Twitter is of much theoretical interest and practical use. Our previous research did not adequately address the deficiency of training data for this multi-class learning task. In this work, we set out by assuming only a small seed training set and experiment with two semi-supervised learning schemes, transductive SVM and graph-based label propagation, which can leverage the knowledge about unlabeled data. The efficacy of semi-supervised learning is established by our extensive experiments, which also show that transductive SVM is more suitable than graph-based label propagation for our task. The empirical findings and detailed evidences can contribute to scalable speech act recognition in Twitter.