Quality enhancement of compressed audio based on statistical conversion

Authors:
Demetrios Cantzos;Athanasios Mouchtaris;Chris Kyriakakis
Affiliations:
Integrated Media Systems Center (IMSC), University of Southern California, Los Angeles, CA and Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH-ICS), Heraklion, Crete, Greece and Department of Computer Science, University of Crete, Heraklion, Crete, Greece;Integrated Media Systems Center (IMSC), University of Southern California, Los Angeles, CA and Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Scalable Audio-Content Analysis
Year:
2008

Citing 5
Cited 1

Discrete cosine transform: algorithms, advantages, applications

Discrete cosine transform: algorithms, advantages, applications
Fundamentals of speech recognition

Fundamentals of speech recognition
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Residual modeling in music analysis-synthesis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Bandwidth extension of low bitrate compressed audio based on statistical conversion

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher quality home audio systems, it is becoming evident that higher bit rates may be required to maintain transparency. We propose a novel method that enhances low bit rate encoded audio segments by applying multiband audio resynthesis methods in a postprocessing stage. Our algorithm employs the highly flexible Generalized Gaussian mixture model which offers a more accurate representation of audio features than the Gaussian mixture model. A novel residual conversion technique is applied which proves to significantly improve the enhancement performance without excessive overhead. In addition, both cepstral and residual errors are dramatically decreased by a feature-alignment scheme that employs a sorting transformation. Some improvements regarding the quantization step are also described that enable us to further reduce the algorithm overhead. Signal enhancement examples are presented and the results show that the overhead size incurred by the algorithm is a fraction of the uncompressed signal size. Our results show that the resulting audio quality is comparable to that of a standard perceptual codec operating at approximately the same bit rate.