A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

Authors:
M. H. Moattar;M. M. Homayounpour
Affiliations:
Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran;Laboratory for Intelligent Sound and Speech Processing, Computer Engineering and Information Technology Dept., Amirkabir University of Technology, Tehran, Iran
Venue:
International Journal of Speech Technology
Year:
2013

Citing 22
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Second-order statistical measures for text-independent speaker identification

Speech Communication
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Speaker change detection and tracking in real-time news broadcasting analysis

Proceedings of the tenth ACM international conference on Multimedia
Robust content-based image searches for copyright protection

MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Speech segmentation without speech recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Supervised classification using MCMC methods

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
Review: Speaker segmentation and clustering

Signal Processing
A Decision-Tree-Based Online Speaker Clustering

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
A Simple But Effective Approach to Speaker Tracking in Broadcast News

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Speaker diarization using unsupervised discriminant analysis of inter-channel delay features

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Effective metric-based speaker segmentation in the frequency domain

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Cluster criterion functions in spectral subspace and their application in speaker clustering

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Fishervoice and semi-supervised speaker clustering

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Online speaker clustering using incremental learning of an ergodic hidden Markov model

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Generative model-based speaker clustering via mixture of von Mises-Fisher distributions

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation

IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker indexing referred in literature as speaker diarization is an important task in audio indexing and retrieval. Speaker indexing includes two important and usually separate stages, namely speaker segmentation and speaker clustering. Speaker indexing can be divided into online and offline categories. This paper mainly focuses on domain independent online speaker indexing. For this purpose, the proposed framework should be parameter free and no application specific parameters such as utterance duration or threshold settings are required. To reduce dependency on parameters, the traditional speaker segmentation is reformed to a voting based homogeneous speech segmentation, in which several approaches are applied in parallel to decide on the existence of a change point. In online indexing, data insufficiency is encountered at each time slice. In the proposed framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigen-voice decomposition space are proposed in this paper. To enhance the tracking performance from the computational cost point of view, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on the 2002 Rich Transcription Broadcast News and Conversational Telephone Speech Corpus (in Garofolo, NIST Rich Transcription, 2002) as well as a synthetic dataset. The indexing error of the proposed framework on telephone conversations, broadcast news and synthetic dataset are 7.51 %, 6.36 % and 9.34 %, respectively. Also, using the index tree structure approach, the tracking run time of the proposed framework is improved by 32 %.