Progress in the AMIDA Speaker Diarization System for Meeting Data

  • Authors:
  • David A. Leeuwen;Matej Konečný

  • Affiliations:
  • TNO Human Factors, Soesterberg, The Netherlands 3769 ZG;TNO Human Factors, Soesterberg, The Netherlands 3769 ZG

  • Venue:
  • Multimodal Technologies for Perception of Humans
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.