Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
IEEE Transactions on Computers
Web resources for language modeling in conversational speech recognition
ACM Transactions on Speech and Language Processing (TSLP)
The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System
Multimodal Technologies for Perception of Humans
The CALO meeting assistant system
IEEE Transactions on Audio, Speech, and Language Processing
Text based dialog act classification for multiparty meetings
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Overlap in meetings: ASR effects and analysis by dialog factors, speakers, and collection site
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Automatic cluster complexity and quantity selection: towards robust speaker diarization
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization for multi-microphone meetings using only between-channel differences
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI meeting transcription system: progress and performance
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The ICSI-SRI spring 2006 meeting recognition system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Hi-index | 0.00 |
We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This year's system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last year's evaluation system. Results on lecture data are comparable to the best reported results for that task.