A fusion study in speech / music classification

Authors:
J. Pinquier;J.-L. Rouas;R. Andre-Obrecht
Affiliations:
Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France;Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France;Inst. de Recherche en Informatique de Toulouse, CNRS, Toulouse, France
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Year:
2003

Citing 3
Cited 4

Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Phonetic and lexical interferences in informational masking during speech-in-speech comprehension

Speech Communication
Speech/Music Discrimination Based on Discrete Wavelet Transform

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Speech/music discrimination using Mel-cepstrum modulation energy

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Audiovisual diarization of people in video content

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present and merge two speech / music classification approaches of that we have developed. The first one is a differentiated modeling approach based on a spectral analysis, which is implemented with GMM. The other one is based on three original features: entropy modulation, stationary segment duration and number of segments. They are merged with the classical 4 Hertz modulation energy. Our classification system is a fusion of the two approaches. It is divided in two classifications (speech/non-speech and music/non-music) and provides 94 % of accuracy for speech detection and 90 % for music detection, with one second of input signal. Beside the spectral information and GMM, classically used in speech / music discrimination, simple parameters bring complementary and efficient information.