A wavelet-based parameterization for speech/music discrimination

Authors:
E. Didiot;I. Illina;D. Fohr;O. Mella
Affiliations:
LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France;LORIA-CNRS and INRIA Nancy-Grand Est, BP 239, 54506 Vandoeuvre-lès-Nancy, France
Venue:
Computer Speech and Language
Year:
2010

Citing 14
Cited 2

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image compression—from DCT to wavelets: a review

Crossroads
The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Automatic classification of speech and music using neural networks

Proceedings of the 2nd ACM international workshop on Multimedia databases
Speech/Music Discrimination Based on Spectral Peak Analysis and Multi-layer Perceptron

ICHIT '06 Proceedings of the 2006 International Conference on Hybrid Information Technology - Volume 02
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
The Teager energy based feature parameters for robust speech recognition in car noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Robust singing detection in speech/music discriminator design

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Detection of speech and music based on spectral tracking

Speech Communication
A speech/music discriminator based on RMS and zero-crossings

IEEE Transactions on Multimedia
Multigroup classification of audio signals using time-frequency parameters

IEEE Transactions on Multimedia

A pertinent learning machine input feature for speaker discrimination by voice

International Journal of Speech Technology
Speech/music discrimination via energy density analysis

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.