Adaptive long-term coding of LSF parameters trajectories for large-delay/very- to ultra-low bit-rate speech coding

Authors:
Laurent Girin
Affiliations:
Laboratoire Grenoblois des Images, de la Parole, du Signal, et de l'Automatique, Saint-Martin d'Heres, France
Venue:
EURASIP Journal on Audio, Speech, and Music Processing
Year:
2010

Citing 9
Cited 0

Speech analysis and synthesis methods developed at ECL in NTT-From LPC to LSP-

Speech Communication - Special issue: Speech research in Japan
Vector quantization and signal compression

Vector quantization and signal compression
Digital Coding of Waveforms: Principles and Applications to Speech and Video

Digital Coding of Waveforms: Principles and Applications to Speech and Video
Linear Prediction of Speech

Linear Prediction of Speech
Smoothing the Evolution of the Spectral Parameters in Linear Prediction of Speech Using Target Matching

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Gaussian mixture Kalman predictive coding of line spectral frequencies

IEEE Transactions on Audio, Speech, and Language Processing
Speech Compression by Polynomial Approximation

IEEE Transactions on Audio, Speech, and Language Processing
Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech

IEEE Transactions on Audio, Speech, and Language Processing
Low-complexity source coding using Gaussian mixture models, lattice vector quantization, and recursive coding with application to speech spectrum quantization

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a model-based method for coding the LSF parameters of LPC speech coders on a "long-term" basis, that is, beyond the usual 20-30 ms frame duration. The objective is to provide efficient LSF quantization for a speech coder with large delay but very- to ultra-low bit-rate (i.e., below 1 kb/s). To do this, speech is first segmented into voiced/unvoiced segments. A Discrete Cosine model of the time trajectory of the LSF vectors is then applied to each segment to capture the LSF interframe correlation over the whole segment. Bi-directional transformation from the model coefficients to a reduced set of LSF vectors enables both efficient "sparse" coding (using here multistage vector quantizers) and the generation of interpolated LSF vectors at the decoder. The proposed method provides up to 50% gain in bit-rate over frame-by-frame quantization while preserving signal quality and competes favorably with 2D-transform coding for the lower range of tested bit rates. Moreover, the implicit time-interpolation nature of the long-term coding process provides this technique a high potential for use in speech synthesis systems.