Live speaker identification in conversations

Authors:
Gerald Friedland;Oriol Vinyals
Affiliations:
International Computer Science Institute, Berkeley, CA, USA;University of California San Diego, San Diego, CA, USA
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 3
Cited 7

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Joke-o-mat: browsing sitcoms punchline by punchline

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Darwin phones: the evolution of sensing and inference on mobile phones

Proceedings of the 8th international conference on Mobile systems, applications, and services
Joke-o-Mat HD: browsing sitcoms with human derived transcripts

Proceedings of the international conference on Multimedia
Tuning-robust initialization methods for speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Effect of noise-in-speech on MFCC parameters

SSIP '09/MIV'09 Proceedings of the 9th WSEAS international conference on signal, speech and image processing, and 9th WSEAS international conference on Multimedia, internet & video technologies
On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

International Journal of Multimedia Data Engineering & Management
Narrative theme navigation for sitcoms supported by fan-generated scripts

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The following article describes our technical demonstration of an online speaker identification system for conversations. A laptop with an internal microphone is centrally placed in the table of a meeting room. The system is able to identify the current speaker independent of spoken text or language with a latency of about 1.5 seconds and an accuracy of about 85% (as evaluated against the NIST RT benchmark). A Java GUI shows the image of the current speaker along with a timeline containing past speakers. Speakers are added to the system's database using a one-minute training procedure.