A wavelet-based parameterization for speech/music discrimination

  • Authors:
  • E. Didiot;I. Illina;D. Fohr;O. Mella

  • Affiliations:
  • LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.