Decoding optimal state sequence with smooth state likelihoods

Authors:
I. Zeljkovic
Affiliations:
AT&TBell Labs., Murray Hill, NJ, USA
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 4

Designing and Evaluating an Adaptive Spoken Dialogue System

User Modeling and User-Adapted Interaction
Restructuring Gaussian mixture density functions in speaker-independent acoustic models

Speech Communication
Predicting automatic speech recognition performance using prosodic cues

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic detection of poor speech recognition at the dialogue level

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel algorithm that allows the decoding of hidden Markov model (HMM) state sequences while constraining the state likelihoods to be more uniform is presented. In HMM-based speech recognizers, the decoded optimal state sequence is restricted by the HMM topology and the grammar. Thus, the most likely state sequence derived by the Viterbi algorithm can be influenced by a few states with very high likelihoods-often resulting in recognition errors. This paper presents a method for decoding state sequences with less volatile state probabilities by introducing penalties proportional to the difference of the current state likelihood and the highest state likelihood for the particular time frame. These penalties are added to the cumulative likelihoods in the Viterbi forward path at every time frame. This technique, referred to as the smooth state likelihood decoding algorithm (SSLDA), reduced recognition error-rates substantially on connected digit tests performed on two speech databases derived from field trials. The error rate was reduced by more than 40% on the one database and more than 60% on the other field trial database for variable length digit strings.