Statistical language modeling for speech disfluencies

Authors:
A. Stolcke;E. Shriberg
Affiliations:
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA;Inst. of Human-Machine-Commun., Munich Univ. of Technol., Germany
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 17

Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Journal of VLSI Signal Processing Systems
Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

Computational Linguistics
Applying repair processing in Chinese homophone disambiguation

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Intonational boundaries, speech repairs and discourse markers: modeling spoken dialog

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
Practical issues in compiling typed unification grammars for speech recognition

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Generating training data for medical dictations

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
LM studies on filled pauses in spontaneous medical dictation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Extracting clauses for spoken language understanding in conversational systems

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
PCFGs with syntactic and prosodic indicators of speech repairs

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hybrid Multi-step Disfluency Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Using integer linear programming for detecting speech disfluencies

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Multi-view semi-supervised learning for dialog act segmentation of speech

IEEE Transactions on Audio, Speech, and Language Processing
The CALO meeting assistant system

IEEE Transactions on Audio, Speech, and Language Processing
Cross-domain speech disfluency detection

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

Speech Communication
Contextual maximum entropy model for edit disfluency detection of spontaneous speech

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. The model is based on a generalization of the standard N-gram language model. It uses dynamic programming to compute the probability of a word sequence, taking into account possible hidden disfluency events. We analyze the model's performance for various disfluency types on the Switchboard corpus. We find that the model reduces the word perplexity in the neighborhood of disfluency events; however, overall differences are small and have no significant impact on the recognition accuracy. We also note that for modeling of the most frequent type of disfluency, filled pauses, a segmentation of utterances into linguistic (rather than acoustic) units is required. Our analysis illustrates a generally useful technique for language model evaluation based on local perplexity comparisons.