The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System

Authors:
Andreas Stolcke;Xavier Anguera;Kofi Boakye;Özgür Çetin;Adam Janin;Mathew Magimai-Doss;Chuck Wooters;Jing Zheng
Affiliations:
SRI International, Menlo Park, U.S.A. and International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;Yahoo, Inc.,;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;International Computer Science Institute, Berkeley, U.S.A.;SRI International, Menlo Park, U.S.A.
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 5
Cited 2

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
The ICSI RT07s Speaker Diarization System

Multimodal Technologies for Perception of Humans
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The ICSI-SRI spring 2006 meeting recognition system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Joke-o-Mat HD: browsing sitcoms with human derived transcripts

Proceedings of the international conference on Multimedia
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new "coffee break" meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.