Autoregressive models of amplitude modulations in audio compression

Authors:
Sriram Ganapathy;Petr Motlicek;Hynek Hermansky
Affiliations:
Electrical and Computer Engineering Department, The Johns Hopkins University, Baltimore, MD;Idiap Research Institute, Martigny, Switzerland;Electrical and Computer Engineering Department, The Johns Hopkins University, Baltimore, MD
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 5
Cited 0

Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Scalable and progressive audio codec

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Unified speech and audio coding scheme for high quality at low bitrates

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Computing the discrete-time “analytic” signal via FFT

IEEE Transactions on Signal Processing
Autoregressive Modeling of Temporal Envelopes

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a scalable medium bit-rate wide-band audio coding technique based on frequency-domain linear prediction (FDLP). FDLP is an efficient method for representing the long-term amplitude modulations of speech/audio signals using autoregressive models. For the proposed audio codec, relatively long temporal segments (1000 ms) of the input audio signal are decomposed into a set of critically sampled sub-bands using a quadrature mirror filter (QMF) bank. The technique of FDLP is applied on each sub-band to model the sub-band temporal envelopes. The residual of the linear prediction, which represents the frequency modulations in the sub-band signal, are encoded and transmitted along with the envelope parameters. These steps are reversed at the decoder to reconstruct the signal. The proposed codec utilizes a simple signal independent nonadaptive compression mechanism for a wide class of speech and audio signals. The subjective and objective quality evaluations show that the reconstruction signal quality for the proposed FDLP codec compares well with the state-of-the-art audio codecs in the 32-64 kbps range.