Acoustical and environmental robustness in automatic speech recognition
Acoustical and environmental robustness in automatic speech recognition
Speech Recognition over Digital Channels: Robustness And Standards
Speech Recognition over Digital Channels: Robustness And Standards
A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
On the Ramsey Class of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Wireless Communications
Recognition of coded speech transmitted over wireless channels
IEEE Transactions on Wireless Communications
IEEE Transactions on Multimedia
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper, we analyze the performance of network speech recognition (NSR) over IP networks, adapting and proposing new solutions to the packet loss problem for code excited linear prediction (CELP) codecs. NSR has a client-server architecture which places the recognizer at the server side using a standard speech codec for speech transmission. Its main advantage is that no changes are required for the existing client devices and networks. However, the use of speech codecs degrades its performance, mainly in the presence of packet losses. First, we study the degradations introduced by CELP codecs in lossy packet networks. Later, we propose a reconstruction technique based on minimum mean square error (MMSE) estimation using hidden Markov models. This approach also allows us to obtain reliability measures associated to each estimate. We show how to use this information to improve the recognition performance by means of soft-data decoding and weighted Viterbi algorithm. The experimental results are obtained for two well-known CELP codecs, G.729 and AMR 12.2 kbps, carrying out recognition from decoded speech. Finally, we analyze an efficient and improved implementation of the proposed techniques using an NSR system which extracts speech recognition features directly from the bit-stream parameters. The experimental results show that the different proposed NSR systems achieve a comparable performance to distributed speech recognition (DSR).