MMSE-based packet loss concealment for CELP-coded speech recognition

Authors:
José L. Carmona;Antonio M. Peinado;José L. Pérez-Córdoba;Angel M. Gómez
Affiliations:
Departamento de Teoría de la Señal, Telemática y Comunicaciones, Universidad de Granada, Granada, Spain;Departamento de Teoría de la Señal, Telemática y Comunicaciones, Universidad de Granada, Granada, Spain;Departamento de Teoría de la Señal, Telemática y Comunicaciones, Universidad de Granada, Granada, Spain;Departamento de Teoría de la Señal, Telemática y Comunicaciones, Universidad de Granada, Granada, Spain
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 8
Cited 1

Acoustical and environmental robustness in automatic speech recognition

Acoustical and environmental robustness in automatic speech recognition
Speech Recognition over Digital Channels: Robustness And Standards

Speech Recognition over Digital Channels: Robustness And Standards
A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
On the Ramsey Class of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Efficient MMSE-based channel error mitigation techniques. Application to distributed speech recognition over wireless channels

IEEE Transactions on Wireless Communications
Recognition of coded speech transmitted over wireless channels

IEEE Transactions on Wireless Communications
Combining Media-Specific FEC and Error Concealment for Robust Distributed Speech Recognition Over Loss-Prone Packet Channels

IEEE Transactions on Multimedia

Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the performance of network speech recognition (NSR) over IP networks, adapting and proposing new solutions to the packet loss problem for code excited linear prediction (CELP) codecs. NSR has a client-server architecture which places the recognizer at the server side using a standard speech codec for speech transmission. Its main advantage is that no changes are required for the existing client devices and networks. However, the use of speech codecs degrades its performance, mainly in the presence of packet losses. First, we study the degradations introduced by CELP codecs in lossy packet networks. Later, we propose a reconstruction technique based on minimum mean square error (MMSE) estimation using hidden Markov models. This approach also allows us to obtain reliability measures associated to each estimate. We show how to use this information to improve the recognition performance by means of soft-data decoding and weighted Viterbi algorithm. The experimental results are obtained for two well-known CELP codecs, G.729 and AMR 12.2 kbps, carrying out recognition from decoded speech. Finally, we analyze an efficient and improved implementation of the proposed techniques using an NSR system which extracts speech recognition features directly from the bit-stream parameters. The experimental results show that the different proposed NSR systems achieve a comparable performance to distributed speech recognition (DSR).