Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Authors:
Jose Pardo;Xavier Anguera;Chuck Wooters
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
2007

Citing 8
Cited 11

A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The 2004 ICSI-SRI-UW meeting recognition system

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Automatic cluster complexity and quantity selection: towards robust speaker diarization

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization for multi-microphone meetings using only between-channel differences

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing

Visual speaker localization aided by acoustic models

MM '09 Proceedings of the 17th ACM international conference on Multimedia
An information theoretic approach to speaker diarization of meeting data

IEEE Transactions on Audio, Speech, and Language Processing
Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

ACM Transactions on Asian Language Information Processing (TALIP)
Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Improving speech processing trough social signals: automatic speaker segmentation of political debates using role based turn-taking patterns

Proceedings of the 2nd international workshop on Social signal processing
Sound Source DOA Estimation and Localization in Noisy Reverberant Environments Using Least-Squares Support Vector Machines

Journal of Signal Processing Systems
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

Speech Communication
A review on speaker diarization systems and approaches

Speech Communication
Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Digital Signal Processing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, which includes speaker diarization together with the annotation of sentence boundaries and the elimination of speaker disfluencies. The sub-area of speaker diarization attempts to identify the number of participants in a meeting and create a list of speech time intervals for each such participant. In this paper, we analyze the correlation between signals coming from multiple microphones and propose an improved method for carrying out speaker diarization for meetings with multiple distant microphones. The proposed algorithm makes use of acoustic information and information from the delays between signals coming from the different sources. Using this procedure, we were able to achieve state-of-the-art performance in the NIST spring 2006 rich transcription evaluation, improving the Diarization Error Rate (DER) by 15% to 20% relative to previous systems.