The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System

  • Authors:
  • Andreas Stolcke;Xavier Anguera;Kofi Boakye;Özgür Çetin;Adam Janin;Mathew Magimai-Doss;Chuck Wooters;Jing Zheng

  • Affiliations:
  • SRI International, Menlo Park, U.S.A. and International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;Yahoo, Inc.,;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;SRI International, Menlo Park, U.S.A.

  • Venue:
  • Multimodal Technologies for Perception of Humans
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new "coffee break" meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.