Wide-band audio coding based on frequency-domain linear prediction

Authors:
Petr Motlicek;Sriram Ganapathy;Hynek Hermansky;Harinath Garudadri
Affiliations:
Idiap Research Institute, Martigny, Switzerland;Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland;Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland;Qualcomm Inc., San Diego, CA
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Year:
2010

Citing 9
Cited 1

Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Scalable and progressive audio codec

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Perceptually Motivated Sub-band Decomposition for FDLP Audio Coding

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Error Resilient Speech Coding Using Sub-band Hilbert Envelopes

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Non-uniform speech/audio coding exploiting predictability of temporal evolution of spectral envelopes

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Frequency domain linear prediction for QMF sub-bands and applications to audio coding

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Speech coding based on spectral dynamics

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Autoregressive Modeling of Temporal Envelopes

IEEE Transactions on Signal Processing

Digital media information compression through auditory content analysis

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service

Quantified Score

Hi-index	0.00

Visualization

Abstract

We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FDLP), is applied for the efficient estimation of the envelope. The processing in the temporal domain allows for a straightforward emulation of the forward temporal masking. This, combined with an efficient nonuniform sub-band decomposition and application of noise shaping in spectral domain instead of temporal domain (a technique to suppress artifacts in tonal audio signals), yields a codec that does not rely on the linear speech production model but rather uses well-accepted concept of frequency-selective auditory perception. As such, the codec is not only specific for coding speech but also well suited for coding other important acoustic signals such as music and mixed content. The quality of the proposed codec at 66 kbps is evaluated using objective and subjective quality assessments. The evaluation indicates competitive performance with the MPEG codecs operating at similar bit rates.