Multi-view semi-supervised learning for dialog act segmentation of speech

Authors:
Umit Guz;Sébastien Cuendet;Dilek Hakkani-Tür;Gokhan Tur
Affiliations:
International Computer Science Institute, Speech Group, Berkeley, CA and Engineering Faculty, Department of Electronics Engineering, Isik University, Istanbul, Turkey;Optaros, Zurich, Switzerland and International Computer Science Institute, Speech Group, Berkeley, CA;International Computer Science Institute, Speech Group, Berkeley, CA;Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 9
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using conditional random fields for sentence boundary detection in speech

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Statistical language modeling for speech disfluencies

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Using prosody for automatic sentence segmentation of multi-party meetings

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Mining association language patterns using a distributional semantic model for negative life event classification

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multiview semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multiview methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.