Distributed meetings: a meeting capture and broadcasting system
Proceedings of the tenth ACM international conference on Multimedia
Automatic Analysis of Multimodal Group Actions in Meetings
IEEE Transactions on Pattern Analysis and Machine Intelligence
EURASIP Journal on Applied Signal Processing
Towards smart meeting: enabling technologies and a real-world application
Proceedings of the 9th international conference on Multimodal interfaces
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Multi-modal speaker diarization of real-world meetings using compressed-domain video features
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
The AMI meeting corpus: a pre-announcement
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Automatic speech recognition and speech activity detection in the CHIL smart room
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Speaker localization in CHIL lectures: evaluation criteria and results
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Speaker diarization for multi-microphone meetings using only between-channel differences
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Blind separation of speech mixtures via time-frequency masking
IEEE Transactions on Signal Processing
Acoustic Beamforming for Speaker Diarization of Meetings
IEEE Transactions on Audio, Speech, and Language Processing
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
HCII'11 Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II
Hi-index | 0.00 |
This paper proposes a speaker diarization method for determining ""who spoke when"" in multi-party conversations, based on the probabilistic fusion of audio and visual location information. The audio and visual information is obtained from a compact system designed to analyze round table multi-party conversations. The system consists of two cameras and a triangular microphone array with three microphones, and can cover a spherical region. Speaker locations are estimated from audio and visual observations in terms of azimuths from this recording system. Unlike conventional speech diarization methods, our proposed method estimates the probability of the presence of multiple simultaneous speakers in a physical space with a small microphone setup instead of using a cascade consisting of speech activity detection, direction of arrival estimation, acoustic feature extraction, and information criteria based speaker segmentation. To estimate the speaker presence more correctly, the speech presence probabilities in a physical space are integrated with the probabilities estimated from participants' face locations obtained with a robust particle filtering based face tracker with two cameras equipped with fisheye lenses. The locations in a physical space with highly integrated probabilities are then classified into a certain number of speaker classes by using on-line classification to realize speaker diarization. The probability calculations and speaker classifications are conducted on-line, making it unnecessary to observe all the conversation data. An experiment using real casual conversations, which include more overlaps and short speech segments than formal meetings, showed the advantages of the proposed method.