MUSEMBLE: A novel music retrieval system with automatic voice query transcription and reformulation
Journal of Systems and Software
Classification of audio signals using SVM and RBFNN
Expert Systems with Applications: An International Journal
Online speech/music segmentation based on the variance mean of filter bank energy
EURASIP Journal on Advances in Signal Processing
IEEE Transactions on Audio, Speech, and Language Processing
Classification of audio signals using AANN and GMM
Applied Soft Computing
Dynamic neural networks applied to melody retrieval
MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Pattern classification models for classifying and indexing audio signals
Engineering Applications of Artificial Intelligence
Combining evidence from temporal and spectral features for person recognition using humming
PerMIn'12 Proceedings of the First Indo-Japan conference on Perception and Machine Intelligence
A new methodology for music retrieval based on dynamic neural networks
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
The problem of unsupervised audio classification and segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in 1) audio classification for speech recognition and 2) audio segmentation for unsupervised multispeaker change detection. A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN). Two new extended-time features: variance of the spectrum flux (VSF) and variance of the zero-crossing rate (VZCR) are used to preclassify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Since historically there have been no features specifically designed for audio segmentation, we evaluate 16 potential features including three new proposed features: perceptual minimum variance distortionless response (PMVDR), smoothed zero-crossing rate (SZCR), and filterbank log energy coefficients (FBLC) in 14 noisy environments to determine the best robust features on the average across these conditions. Next, a new distance metric, T2-mean, is proposed which is intended to improve segmentation for short segment turns (i.e., 1-5 s). A new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate. Evaluations on a standard data set-Defense Advanced Research Projects Agency (DARPA) Hub4 Broadcast News 1997 evaluation data-show that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm, and the proposed compound segmentation algorithm achieves 23%-10% improvement in all metrics versus the baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm. The new classification and segmentation algorithms also obtain very satisfactory results on the more diverse and challenging National Gallery of the Spoken Word (NGSW) corpus.