Modelling the prepausal lengthening effect for speech recognition: a dynamic Bayesian network approach

  • Authors:
  • Ning Ma;Chris D. Bartels;Jeff A. Bilmes;Phil D. Green

  • Affiliations:
  • Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK;Department of Electrical Engineering, University of Washington, Seattle, 98195, USA;Department of Electrical Engineering, University of Washington, Seattle, 98195, USA;Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech has a property that the speech unit preceding a speech pause tends to lengthen. This work presents the use of a dynamic Bayesian network to model the prepausal lengthening effect for robust speech recognition. Specifically, we introduce two distributions to model inter-state transitions in prepausal and non-prepausal words, respectively. The selection of the transition distributions depends on a random variable whose value is influenced by whether a pause will appear between the current and the following word. Two experiments are presented here. The first one considers pauses hypothesised during speech decoding. The second one employs an extra component for speech/non-speech determination. By modelling the prepausal lengthening effect we achieve a 5.5% relative reduction in word error rate on the 500-word task of the SVitchboard corpus.