Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A comparison of features for speech, music discrimination
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Hierarchical classification of audio data for archiving and retrieving
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Speech/music discrimination for multimedia applications
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet
Pattern Recognition Letters
Speech/non-speech segmentation based on phoneme recognition features
EURASIP Journal on Applied Signal Processing
Review: Speaker segmentation and clustering
Signal Processing
ZemPod: A semantic web approach to podcasting
Web Semantics: Science, Services and Agents on the World Wide Web
A simulated annealing approach to speaker segmentation in audio databases
Engineering Applications of Artificial Intelligence
Classification of audio signals using SVM and RBFNN
Expert Systems with Applications: An International Journal
An Efficient Approach for Classification of Speech and Music
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Recognition of Western style musical genres using machine learning techniques
Expert Systems with Applications: An International Journal
A decision-tree-based algorithm for speech/music classification and segmentation
EURASIP Journal on Audio, Speech, and Music Processing
A wavelet-based parameterization for speech/music discrimination
Computer Speech and Language
Social signal processing: Survey of an emerging domain
Image and Vision Computing
Noise robust features for speech/music discrimination in real-time telecommunication
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Speech/music discrimination using Mel-cepstrum modulation energy
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Online speech/music segmentation based on the variance mean of filter bank energy
EURASIP Journal on Advances in Signal Processing
Improvement to speech-music discrimination using sinusoidal model based features
Multimedia Tools and Applications
Classification of audio signals using AANN and GMM
Applied Soft Computing
Audio-based semantic concept classification for consumer video
IEEE Transactions on Audio, Speech, and Language Processing
Robust speech/non-speech classification in heterogeneous multimedia content
Speech Communication
Pattern classification models for classifying and indexing audio signals
Engineering Applications of Artificial Intelligence
Singer identification using time-frequency audio feature
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Toward a sound analysis system for telemedicine
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Robust speech detection based on phoneme recognition features
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
International Journal of Speech Technology
Digital Signal Processing
Dictionary learning based sparse coefficients for audio classification with max and average pooling
Digital Signal Processing
Hi-index | 0.00 |
In this paper, we present a new approach towards high performance speech/music discrimination on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, an artificial neural network (ANN) trained on clean speech only (as used in a standard large vocabulary speech recognition system) is used as a channel model at the output of which the entropy and "dynamism" will be measured every 10 ms. These features are then integrated over time through an ergodic 2-state (speech and non-speech) hidden Markov model (HMM) with minimum duration constraints on each HMM state. For instance, in the case of entropy, it is indeed clear (and observed in practice) that, on average, the entropy at the output of the ANN will be larger for non-speech segments than speech segments presented at their input. In our case, the ANN acoustic model was a multi-layer perceptron (MLP, as often used in hybrid HMM/ANN systems) generating at its output estimators of the phonetic posterior probabilities based on the acoustic vectors at its input. It is from these outputs, thus from "real" probabilities, that the entropy and dynamism are estimated. The 2-state speech/non-speech HMM will take these two-dimensional features (entropy and dynamism) whose distributions will be modeled through multi-Gaussian densities or a secondary MLP. The parameters of this HMM are trained in a supervised manner using Viterbi algorithm.Although the proposed method can easily be adapted to other speech/non-speech discrimination applications, the present paper only focuses on speech/music segmentation. Different experiments, including different speech and music styles, as well as different temporal distributions of the speech and music signals (real data distribution, mostly speech, or mostly music), illustrate the robustness of the approach, always resulting in a correct segmentation performance higher than 90%. Finally, we will show how a confidence measure can be used to further improve the segmentation results, and also discuss how this may be used to extend the technique to the case of speech/music mixtures.