A comparison of features for speech, music discrimination

Authors:
M. J. Carey;E. S. Parris;H. Lloyd-Thomas
Affiliations:
Ensigma Ltd., Chepstow, UK;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 25

Visualizing music and audio using self-similarity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Pause concepts for audio segmentation at different semantic levels

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Automatic classification of speech and music using neural networks

Proceedings of the 2nd ACM international workshop on Multimedia databases
A fusion study in speech / music classification

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Automatic discrimination between laughter and speech

Speech Communication
Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination

Engineering Applications of Artificial Intelligence
Audio classification in speech and music: a comparison between a statistical and a neural approach

EURASIP Journal on Applied Signal Processing
Robust in-car speech recognition based on nonlinear multiple regressions

EURASIP Journal on Applied Signal Processing
Speech/Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent Audio Coding

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Automatic music boundary detection using short segmental acoustic similarity in a music piece

EURASIP Journal on Audio, Speech, and Music Processing - Intelligent Audio, Speech, and Music Processing Applications
New speech/music discrimination approach based on fundamental frequency estimation

Multimedia Tools and Applications
A decision-tree-based algorithm for speech/music classification and segmentation

EURASIP Journal on Audio, Speech, and Music Processing
A wavelet-based parameterization for speech/music discrimination

Computer Speech and Language
Environmental sound recognition with time-frequency audio features

IEEE Transactions on Audio, Speech, and Language Processing
Noise robust features for speech/music discrimination in real-time telecommunication

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Engineering Applications of Artificial Intelligence
Detecting semantic concepts from video using temporal gradients and audio classification

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
First steps to an audio ontology-based classifier for telemedicine

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Toward a sound analysis system for telemedicine

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Hybrid active learning for reducing the annotation effort of operators in classification systems

Pattern Recognition
Dictionary learning based sparse coefficients for audio classification with max and average pooling

Digital Signal Processing
Speech/music discrimination via energy density analysis

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several approaches have previously been taken to the problem of discriminating between speech and music signals. These have used different features as the input to the classifier and have tested and trained on different material. In this paper we examine the discrimination achieved by several different features using common training and test sets and the same classifier. The database assembled for these tests includes speech from thirteen languages and music from all over the world. In each case the distributions in the feature space were modelled by a Gaussian mixture model. Experiments were carried out on four types of feature, amplitude, cepstra, pitch and zero-crossings. In each case the derivative of the feature was also used and found to improve performance. The best performance resulted from using the cepstra and delta cepstra which gave an equal error rate (EER) of 1.28. This was closely followed by normalised amplitude and delta amplitude. This however used a much less complex model. The pitch and delta pitch gave an EER of 4% which was better than the zero-crossing which produced an EER of 6%.