Robust speech recognition using the modulation spectrogram
Speech Communication - Special issue on robust speech recognition
Scalable and progressive audio codec
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Unified speech and audio coding scheme for high quality at low bitrates
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Computing the discrete-time “analytic” signal via FFT
IEEE Transactions on Signal Processing
Autoregressive Modeling of Temporal Envelopes
IEEE Transactions on Signal Processing
Hi-index | 0.00 |
We present a scalable medium bit-rate wide-band audio coding technique based on frequency-domain linear prediction (FDLP). FDLP is an efficient method for representing the long-term amplitude modulations of speech/audio signals using autoregressive models. For the proposed audio codec, relatively long temporal segments (1000 ms) of the input audio signal are decomposed into a set of critically sampled sub-bands using a quadrature mirror filter (QMF) bank. The technique of FDLP is applied on each sub-band to model the sub-band temporal envelopes. The residual of the linear prediction, which represents the frequency modulations in the sub-band signal, are encoded and transmitted along with the envelope parameters. These steps are reversed at the decoder to reconstruct the signal. The proposed codec utilizes a simple signal independent nonadaptive compression mechanism for a wide class of speech and audio signals. The subjective and objective quality evaluations show that the reconstruction signal quality for the proposed FDLP codec compares well with the state-of-the-art audio codecs in the 32-64 kbps range.