Speech/Music Discrimination Based on Discrete Wavelet Transform

Authors:
Stavros Ntalampiras;Nikos Fakotakis
Affiliations:
Electrical and Computer Engineering Department, Wire Communication Laboratory, University of Patras, Patras, Greece 26500 Rio;Electrical and Computer Engineering Department, Wire Communication Laboratory, University of Patras, Patras, Greece 26500 Rio
Venue:
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Year:
2008

Citing 7
Cited 0

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
An overview of audio information retrieval

Multimedia Systems - Special issue on audio and multimedia
NETLAB: algorithms for pattern recognition

NETLAB: algorithms for pattern recognition
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Automatic classification of speech and music using neural networks

Proceedings of the 2nd ACM international workshop on Multimedia databases
A fusion study in speech / music classification

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an effective approach which addresses the issue of speech/music discrimination. Our architecture focuses on the matter from the scope of improving the performance of a speech recognition system by excluding the processing of information which is not speech. Multiresolution analysis is applied to the input signal while the most significant statistical features are calculated over a predefined texture size. These characteristics are then modeled using a state of the art technique for probability density function estimation, Gaussian mixture models (GMM). A classification scheme consisting of a conventional maximum likelihood decision methodology constitutes the next step of our implementation. Despite the fact that our system is based solely on wavelet signal processing, it demonstrated very good performance achieving 91.8% recognition rate.