Quality enhancement of compressed audio based on statistical conversion

  • Authors:
  • Demetrios Cantzos;Athanasios Mouchtaris;Chris Kyriakakis

  • Affiliations:
  • Integrated Media Systems Center (IMSC), University of Southern California, Los Angeles, CA and Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH-ICS), Heraklion, Crete, Greece and Department of Computer Science, University of Crete, Heraklion, Crete, Greece;Integrated Media Systems Center (IMSC), University of Southern California, Los Angeles, CA and Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA

  • Venue:
  • EURASIP Journal on Audio, Speech, and Music Processing - Scalable Audio-Content Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher quality home audio systems, it is becoming evident that higher bit rates may be required to maintain transparency. We propose a novel method that enhances low bit rate encoded audio segments by applying multiband audio resynthesis methods in a postprocessing stage. Our algorithm employs the highly flexible Generalized Gaussian mixture model which offers a more accurate representation of audio features than the Gaussian mixture model. A novel residual conversion technique is applied which proves to significantly improve the enhancement performance without excessive overhead. In addition, both cepstral and residual errors are dramatically decreased by a feature-alignment scheme that employs a sorting transformation. Some improvements regarding the quantization step are also described that enable us to further reduce the algorithm overhead. Signal enhancement examples are presented and the results show that the overhead size incurred by the algorithm is a fraction of the uncompressed signal size. Our results show that the resulting audio quality is comparable to that of a standard perceptual codec operating at approximately the same bit rate.