Speaker Diarization for Conference Room: The UPC RT07s Evaluation System

  • Authors:
  • Jordi Luque;Xavier Anguera;Andrey Temko;Javier Hernando

  • Affiliations:
  • Technical University of Catalonia (UPC), Barcelona, Spain 08034;Multilinguism group, Telefónica I+D, Barcelona, Spain 08021;Technical University of Catalonia (UPC), Barcelona, Spain 08034;Technical University of Catalonia (UPC), Barcelona, Spain 08034

  • Venue:
  • Multimodal Technologies for Perception of Humans
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper the authors present the UPC speaker diarization system for the NIST Rich Transcription Evaluation (RT07s) [1] conducted on the conference environment. The presented system is based on the ICSI RT06s system, which employs agglomerative clustering with a modified Bayesian Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters [2]. This is the first participation of the UPC in the RT Speaker Diarization Evaluation and the purpose of this work has been the consolidation of a baseline system which can be used in the future for further research in the field of diarization. We have introduced, as prior modules before the diarization system, an Speech/Non-Speech detection module based on a Support Vector Machine from UPC and a Wiener Filtering from an implementation of the QIO front-end. In the speech parameterization a Frequency Filtering (FF) of the filter-bank energies is applied instead the classical Discrete Cosine Transform in the Mel-Cepstrum analysis. In addition, it is introduced a small changes in the complexity selection algorithm and a new post-processing technique which process the shortest clusters at the end of each Viterbi segmentation.