A fast audio classification from MPEG coded data

  • Authors:
  • Y. Nakajima;Yang Lu;M. Sugano;A. Yoneyama;H. Yamagihara;A. Kurematsu

  • Affiliations:
  • KDD R, Saitama, Japan;-;-;-;-;-

  • Venue:
  • ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Audio information classification becomes a very important task for such purposes as automatic keyword spotting and other content-based audio-visual query systems. In this paper, we describe a fast and accurate audio data classification method on the MPEG coded data domain. Firstly silent segments are detected using a robust approach for different recording conditions. Then the non-silent segments are classified into three types, music, speech, and applause using temporal density, bandwidth and center frequency of subband energy. In order to be robust for a variety of audio sources as much as possible, we use Bayes discriminant function for multivariate Gaussian distribution instead of manually adjusting a threshold for each discriminator. In the experiment, every one-second of MPEG audio data is classified and about 90% of audio and speech segments have been successfully detected. As for the detection speed, less than 20% of MPEG audio decoding processing power is required.