A fusion study in speech / music classification

  • Authors:
  • J. Pinquier;J.-L. Rouas;R. Andre-Obrecht

  • Affiliations:
  • Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France;Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France;Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France

  • Venue:
  • ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present and merge two speech / music classification approaches of that we have developed. The first one is a differentiated modeling approach based on a spectral analysis, which is implemented with GMM. The other one is based on three original features: entropy modulation, stationary segment duration and number of segments. They are merged with the classical 4 Hertz modulation energy. Our classification system is a fusion of the two approaches. It is divided in two classifications (speech/non-speech and music/non-music) and provides 94 % of accuracy for speech detection and 90 % for music detection, with one second of input signal. Beside the spectral information and GMM, classically used in speech / music discrimination, simple parameters bring complementary and efficient information.