An excitation level based psychoacoustic model for audio compression

  • Authors:
  • Ye Wang;Miikka Vilermo

  • Affiliations:
  • Nokia Research Center, Speech and Audio Systems Lab, Tampere, Finland;Nokia Research Center, Speech and Audio Systems Lab, Tampere, Finland

  • Venue:
  • MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an excitation level based psychoacoustic model to estimate the simultaneous masking threshold for audio coding. The system has the following stages: 1) a windowing function; 2) a time-to-frequency transformation; 3) an excitation level calculation block similar to that in Moore and Glasberg's loudness model; 4) a correction factor for estimating masking threshold; 5) the inclusion of the absolute masking threshold; 6) the output Signal-to-Masking ratio. We have evaluated the performance by integrating the proposed psychoacoustic model into an audio coder similar to MPEG-2 AAC, which contains only the basic coding tools. Our model performs better than or as well as the psychoacoustic model suggested in the MPEG-2 AAC audio coding standard for all the test signals. We can achieve almost transparent quality with bitrate below 64 kbps for most of the critical test signals. Significant improvements have been achieved with speech signals, which are always difficult for transform audio coders.