Visualizing music and audio using self-similarity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Pause concepts for audio segmentation at different semantic levels
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Automatic classification of speech and music using neural networks
Proceedings of the 2nd ACM international workshop on Multimedia databases
A fusion study in speech / music classification
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Automatic discrimination between laughter and speech
Speech Communication
Engineering Applications of Artificial Intelligence
Audio classification in speech and music: a comparison between a statistical and a neural approach
EURASIP Journal on Applied Signal Processing
Robust in-car speech recognition based on nonlinear multiple regressions
EURASIP Journal on Applied Signal Processing
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Automatic music boundary detection using short segmental acoustic similarity in a music piece
EURASIP Journal on Audio, Speech, and Music Processing - Intelligent Audio, Speech, and Music Processing Applications
New speech/music discrimination approach based on fundamental frequency estimation
Multimedia Tools and Applications
A decision-tree-based algorithm for speech/music classification and segmentation
EURASIP Journal on Audio, Speech, and Music Processing
A wavelet-based parameterization for speech/music discrimination
Computer Speech and Language
Environmental sound recognition with time-frequency audio features
IEEE Transactions on Audio, Speech, and Language Processing
Noise robust features for speech/music discrimination in real-time telecommunication
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Engineering Applications of Artificial Intelligence
Detecting semantic concepts from video using temporal gradients and audio classification
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Online speech/music segmentation based on the variance mean of filter bank energy
EURASIP Journal on Advances in Signal Processing
Speech/music discrimination in audio podcast using structural segmentation and timbre recognition
CMMR'10 Proceedings of the 7th international conference on Exploring music contents
First steps to an audio ontology-based classifier for telemedicine
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Toward a sound analysis system for telemedicine
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Dictionary learning based sparse coefficients for audio classification with max and average pooling
Digital Signal Processing
Speech/music discrimination via energy density analysis
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Several approaches have previously been taken to the problem of discriminating between speech and music signals. These have used different features as the input to the classifier and have tested and trained on different material. In this paper we examine the discrimination achieved by several different features using common training and test sets and the same classifier. The database assembled for these tests includes speech from thirteen languages and music from all over the world. In each case the distributions in the feature space were modelled by a Gaussian mixture model. Experiments were carried out on four types of feature, amplitude, cepstra, pitch and zero-crossings. In each case the derivative of the feature was also used and found to improve performance. The best performance resulted from using the cepstra and delta cepstra which gave an equal error rate (EER) of 1.28. This was closely followed by normalised amplitude and delta amplitude. This however used a much less complex model. The pitch and delta pitch gave an EER of 4% which was better than the zero-crossing which produced an EER of 6%.