On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A syntactic framework for speech repairs and other disruptions
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A TAG-based noisy channel model of speech repairs
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A lexically-driven algorithm for disfluency detection
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
IEEE Transactions on Audio, Speech, and Language Processing
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies
IEEE Transactions on Audio, Speech, and Language Processing
Edit disfluency detection and correction using a cleanup language model and an alignment model
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.