Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Authors:
Peter Jančovič;Münevver Köküer
Affiliations:
School of Electronic, Electrical and Computer Engineering, University of Birmingham, Pritchatts Road, B15 2TT Birmingham, UK;School of Electronic, Electrical and Computer Engineering, University of Birmingham, Pritchatts Road, B15 2TT Birmingham, UK
Venue:
Speech Communication
Year:
2009

Citing 4
Cited 0

Time and frequency filtering of filter-bank energies for robust HMM speech recognition

Speech Communication - Special issue on noise robust ASR
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Automatic language identification

Speech Communication
Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed.