Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

Authors:
Rongqing Huang;J. H.L. Hansen
Affiliations:
Robust Speech Process. Group, Univ. of Colorado, Boulder, CO, USA;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 10

MUSEMBLE: A novel music retrieval system with automatic voice query transcription and reformulation

Journal of Systems and Software
Classification of audio signals using SVM and RBFNN

Expert Systems with Applications: An International Journal
Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Classification of audio signals using AANN and GMM

Applied Soft Computing
Dynamic neural networks applied to melody retrieval

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Pattern classification models for classifying and indexing audio signals

Engineering Applications of Artificial Intelligence
Combining evidence from temporal and spectral features for person recognition using humming

PerMIn'12 Proceedings of the First Indo-Japan conference on Perception and Machine Intelligence
Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification

Speech Communication
A new methodology for music retrieval based on dynamic neural networks

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of unsupervised audio classification and segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in 1) audio classification for speech recognition and 2) audio segmentation for unsupervised multispeaker change detection. A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN). Two new extended-time features: variance of the spectrum flux (VSF) and variance of the zero-crossing rate (VZCR) are used to preclassify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Since historically there have been no features specifically designed for audio segmentation, we evaluate 16 potential features including three new proposed features: perceptual minimum variance distortionless response (PMVDR), smoothed zero-crossing rate (SZCR), and filterbank log energy coefficients (FBLC) in 14 noisy environments to determine the best robust features on the average across these conditions. Next, a new distance metric, T2-mean, is proposed which is intended to improve segmentation for short segment turns (i.e., 1-5 s). A new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate. Evaluations on a standard data set-Defense Advanced Research Projects Agency (DARPA) Hub4 Broadcast News 1997 evaluation data-show that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm, and the proposed compound segmentation algorithm achieves 23%-10% improvement in all metrics versus the baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm. The new classification and segmentation algorithms also obtain very satisfactory results on the more diverse and challenging National Gallery of the Spoken Word (NGSW) corpus.