Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features

Authors:
Chung-Hsien Wu;Wei-Bin Liang;Jui-Feng Yeh
Affiliations:
National Cheng Kung University;National Cheng Kung University;National Chiayi University
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2011

Citing 13
Cited 0

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Information Retrieval

Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

Computational Linguistics
Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A syntactic framework for speech repairs and other disruptions

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A TAG-based noisy channel model of speech repairs

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A lexically-driven algorithm for disfluency detection

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

IEEE Transactions on Audio, Speech, and Language Processing
Edit disfluency detection and correction using a cleanup language model and an alignment model

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.