Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
The ICSI RT07s Speaker Diarization System
Multimodal Technologies for Perception of Humans
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The ICSI-SRI spring 2006 meeting recognition system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Joke-o-Mat HD: browsing sitcoms with human derived transcripts
Proceedings of the international conference on Multimedia
Hi-index | 0.00 |
We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new "coffee break" meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.