Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Authors:
Chung-Hsien Wu;Gwo-Lang Yan
Affiliations:
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, Republic of China;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, Republic of China
Venue:
Journal of VLSI Signal Processing Systems
Year:
2004

Citing 5
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
Using Discriminant Eigenfeatures for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
PCA versus LDA

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical language modeling for speech disfluencies

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Discriminative mixture weight estimation for large Gaussian mixture models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Stochastic vector mapping-based feature enhancement using prior-models and model adaptation for noisy speech recognition

Speech Communication
Modeling filled pauses for spontaneous speech recognition applications

AEE'08 Proceedings of the 7th WSEAS International Conference on Application of Electrical Engineering
Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas

WSEAS Transactions on Signal Processing
Contextual maximum entropy model for edit disfluency detection of spontaneous speech

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “hem” in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively. A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.