Combined speech enhancement and auditory modelling for robust distributed speech recognition

Authors:
Ronan Flynn;Edward Jones
Affiliations:
Department of Electronic Engineering, Athlone Institute of Technology, Ireland;Department of Electronic Engineering, National University of Ireland, Galway, Ireland
Venue:
Speech Communication
Year:
2008

Citing 2
Cited 5

Combining speech enhancement and auditory feature extraction for robust speech recognition

Speech Communication - Special issue on noise robust ASR
Speech enhancement for personal communication using an adaptive gain equalizer

Signal Processing

Robust distributed speech recognition in noise and packet loss conditions

Digital Signal Processing
A hierarchical framework for spectro-temporal feature extraction

Speech Communication
Morphological processing of spectrograms for speech enhancement

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Reducing bandwidth for robust distributed speech recognition in conditions of packet loss

Speech Communication
Cry-based classification of healthy and sick infants using adapted boosting mixture learning method for gaussian mixture models

Modelling and Simulation in Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of automatic speech recognition (ASR) systems in the presence of noise is an area that has attracted a lot of research interest. Additive noise from interfering noise sources, and convolutional noise arising from transmission channel characteristics both contribute to a degradation of performance in ASR systems. This paper addresses the problem of robustness of speech recognition systems in the first of these conditions, namely additive noise. In particular, the paper examines the use of the auditory model of Li et al. [Li, Q., Soong, F.K., Siohan, O., 2000. A high-performance auditory feature for robust speech recognition. In: Proc. 6th Internat. Conf. on Spoken Language Processing (ICSLP), Vol. III. pp. 51-54] as a front-end for a HMM-based speech recognition system. The choice of this particular auditory model is motivated by the results of a previous study by Flynn and Jones [Flynn, R., Jones, E., 2006. A comparative study of auditory-based front-ends for robust speech recognition using the Aurora 2 database. In: Proc. IET Irish Signals and Systems Conf., Dublin, Ireland. pp. 111-116] in which this auditory model was found to exhibit superior performance for the task of robust speech recognition using the Aurora 2 database [Hirsch, H.G., Pearce, D., 2000. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ISCA ITRW ASR2000, Paris, France. pp. 181-188]. In the speech recognition system described here, the input speech is pre-processed using an algorithm for speech enhancement. A number of different methods for the enhancement of speech, combined with the auditory front-end of Li et al., are evaluated for the purpose of robust connected digit recognition. The ETSI basic [ETSI ES 201 108 Ver. 1.1.3, 2003. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms] and advanced [ETSI ES 202 050 Ver. 1.1.5, 2007. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms] front-ends proposed for DSR are used as a baseline for comparison. In addition to their effects on speech recognition performance, the speech enhancement algorithms are also assessed using perceptual speech quality tests, in order to examine if a correlation exists between perceived speech quality and recognition performance. Results indicate that the combination of speech enhancement pre-processing and the auditory model front-end provides an improvement in recognition performance in noisy conditions over the ETSI front-ends.