Improvement to speech-music discrimination using sinusoidal model based features

Authors:
Jalil Shirazi;Shahrokh Ghaemmaghami
Affiliations:
Science & Research Branch, Islamic Azad University, Tehran, Iran;Sharif University of Technology, Tehran, Iran
Venue:
Multimedia Tools and Applications
Year:
2010

Citing 11
Cited 0

Support-Vector Networks

Machine Learning
Classification of general audio data for content-based retrieval

Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Singing voice detection in music tracks using direct voice vibrato detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Sinusoidal model-based analysis and classification of stressed speech

IEEE Transactions on Audio, Speech, and Language Processing
Parametric Representations of Bird Sounds for Automatic Species Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.