The rich transcription 2005 spring meeting recognition evaluation
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The NIST meeting room corpus 2 phase 1
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Evaluations of Automatic Speaker Classification Systems
Speaker Classification I
The Rich Transcription 2007 Meeting Recognition Evaluation
Multimodal Technologies for Perception of Humans
The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
Multimodal Technologies for Perception of Humans
The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings
Multimodal Technologies for Perception of Humans
Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Enhanced speech features by single-channel joint compensation of noise and reverberation
IEEE Transactions on Audio, Speech, and Language Processing
An information theoretic approach to speaker diarization of meeting data
IEEE Transactions on Audio, Speech, and Language Processing
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Computer-supported human-human multilingual communication
50 years of artificial intelligence
Robust speech/non-speech classification in heterogeneous multimedia content
Speech Communication
The AMI speaker diarization system for NIST RT06s meeting data
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Hi-index | 0.00 |
We present the design and results of the Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation; the fourth in a series of community-wide evaluations of language technologies in the meeting domain. For 2006, we supported three evaluation tasks in two meeting sub-domains: the Speech-To-Text (STT) transcription task, and the “Who Spoke When” and “Speech Activity Detection” diarization tasks. The meetings were from the Conference Meeting, and Lecture Meeting sub-domains. The lowest STT word error rate, with up to four simultaneous speakers, in the multiple distant microphone condition was 46.3% for the conference sub-domain, and 53.4% for the lecture sub-domain. For the “Who Spoke When” task, the lowest diarization error rates for all speech were 35.8% and 24.0% for the conference and lecture sub-domains respectively. For the “Speech Activity Detection” task, the lowest diarization error rates were 4.3% and 8.0% for the conference and lecture sub-domains respectively.