Low bit rate compression methods of feature vectors for distributed speech recognition

Authors:
Jose Enrique Garcia;Alfonso Ortega;Antonio Miguel;Eduardo Lleida
Affiliations:
-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 12
Cited 0

Bayesian interpolation

Neural Computation
Wavelets and subband coding

Wavelets and subband coding
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
A robust scheme for distributed speech recognition over loss-prone packet channels

Speech Communication
Robust distributed speech recognition in noise and packet loss conditions

Digital Signal Processing
Robust speech recognition over mobile and IP networks in burst-like packet loss

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Reducing bandwidth for robust distributed speech recognition in conditions of packet loss

Speech Communication
Combining Media-Specific FEC and Error Concealment for Robust Distributed Speech Recognition Over Loss-Prone Packet Channels

IEEE Transactions on Multimedia
Instrumentable tree encoding of information sources (Corresp.)

IEEE Transactions on Information Theory
Least squares quantization in PCM

IEEE Transactions on Information Theory
Quantization of cepstral parameters for speech recognition over the World Wide Web

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a family of compression methods based on differential vector quantization (DVQ) for encoding Mel frequency cepstral coefficients (MFCC) in distributed speech recognition (DSR) applications. The proposed techniques benefit from the existence of temporal correlation across consecutive MFCC frames as well as the presence of intra-frame redundancy. We present DVQ schemes based on linear prediction and non-linear methods with multi-layer perceptrons (MLP). In addition to this, we propose the use of a multipath search coding strategy based on the M-algorithm that obtains the sequence of centroids that minimize the quantization error globally instead of selecting the centroids that minimize the quantization error locally in a frame by frame basis. We have evaluated the performance of the proposed methods for two different tasks. On the one hand, two small-size vocabulary databases, Spechdat-Car and Aurora 2, have been considered obtaining negligible degradation in terms of Word Accuracy (around 1%) compared to the unquantized scheme for bit-rates as low as 0.5kbps. On the other hand, for a large vocabulary task (Aurora 4), the proposed method achieves a WER comparable to the unquantized scheme only with 1.6kbps. Moreover, we propose a combined scheme (differential/non-differential) that allows the system to present the same sensitivity to transmission errors than previous multi-frame coding proposals for DSR.