Multimodal processing by finding common cause
Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Augmented collaborative spaces
ETP '03 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence
Signal Processing - Special section: Multimodal human-computer interfaces
Robust user context analysis for multimodal interfaces
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Hi-index | 0.00 |
A serious problem in both audio and video conferencing facilities available today is the difficulty in determining who is speaking among a large number of participants. There is a strong need for developing meeting room infrastructure and teleconference facilities that improve the sense of presence and participation experienced in remote meetings. We present a distributed multimodal tracking system that uses multiple cameras and microphones to automatically select the current speaker among multiple meeting participants. The system actively obtains and transmits video showing a good view of the selected speaker. The tracking system is integrated into a web-based video conferencing application that connects seven meeting rooms around the globe. An important part of designing such a system is to determine sensor placement and configuration through systematic experiments in the actual rooms where the system is deployed.