Low bit rate compression methods of feature vectors for distributed speech recognition

  • Authors:
  • Jose Enrique Garcia;Alfonso Ortega;Antonio Miguel;Eduardo Lleida

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Speech Communication
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a family of compression methods based on differential vector quantization (DVQ) for encoding Mel frequency cepstral coefficients (MFCC) in distributed speech recognition (DSR) applications. The proposed techniques benefit from the existence of temporal correlation across consecutive MFCC frames as well as the presence of intra-frame redundancy. We present DVQ schemes based on linear prediction and non-linear methods with multi-layer perceptrons (MLP). In addition to this, we propose the use of a multipath search coding strategy based on the M-algorithm that obtains the sequence of centroids that minimize the quantization error globally instead of selecting the centroids that minimize the quantization error locally in a frame by frame basis. We have evaluated the performance of the proposed methods for two different tasks. On the one hand, two small-size vocabulary databases, Spechdat-Car and Aurora 2, have been considered obtaining negligible degradation in terms of Word Accuracy (around 1%) compared to the unquantized scheme for bit-rates as low as 0.5kbps. On the other hand, for a large vocabulary task (Aurora 4), the proposed method achieves a WER comparable to the unquantized scheme only with 1.6kbps. Moreover, we propose a combined scheme (differential/non-differential) that allows the system to present the same sensitivity to transmission errors than previous multi-frame coding proposals for DSR.