HMM adaptation for applications in telecommunication
Speech Communication - Special issue on noise robust ASR
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Using audio and video features to classify the most dominant person in a group meeting
Proceedings of the 15th international conference on Multimedia
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
IEEE Transactions on Computers
Progress in the AMIDA Speaker Diarization System for Meeting Data
Multimodal Technologies for Perception of Humans
Multimodal Technologies for Perception of Humans
Live speaker identification in conversations
MM '08 Proceedings of the 16th ACM international conference on Multimedia
TUT acoustic source tracking system 2006
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Opportunities and challenges of parallelizing speech recognition
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Tuning-robust initialization methods for speaker diarization
IEEE Transactions on Audio, Speech, and Language Processing
The TNO speaker diarization system for NIST RT05s meeting data
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Automatic cluster complexity and quantity selection: towards robust speaker diarization
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization for multi-microphone meetings using only between-channel differences
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Technical improvements of the E-HMM based speaker diarization system for meeting records
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI speaker diarization system for NIST RT06s meeting data
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The ICSI-SRI spring 2006 meeting recognition system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches
Speech Communication
Spoken Content Retrieval: A Survey of Techniques and Technologies
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.