A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
The LIMSI Broadcast News transcription system
Speech Communication - Special issue on automatic transcription of broadcast news data
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Automatic classification of speech and music using neural networks
Proceedings of the 2nd ACM international workshop on Multimedia databases
Speech/Music Discrimination Based on Spectral Peak Analysis and Multi-layer Perceptron
ICHIT '06 Proceedings of the 2006 International Conference on Hybrid Information Technology - Volume 02
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
The Teager energy based feature parameters for robust speech recognition in car noise
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
A comparison of features for speech, music discrimination
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Robust singing detection in speech/music discriminator design
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Detection of speech and music based on spectral tracking
Speech Communication
A speech/music discriminator based on RMS and zero-crossings
IEEE Transactions on Multimedia
Multigroup classification of audio signals using time-frequency parameters
IEEE Transactions on Multimedia
A pertinent learning machine input feature for speaker discrimination by voice
International Journal of Speech Technology
Speech/music discrimination via energy density analysis
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.