Robust Real-Time Face Detection
International Journal of Computer Vision
Face Tracking in Meeting Room Scenarios Using Omnidirectional Views
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Tracking head pose and focus of attention with multiple far-field cameras
Proceedings of the 8th international conference on Multimodal interfaces
Real-time Visual Tracker by Stream Processing
Journal of Signal Processing Systems
Robust real time face tracking for the analysis of human behaviour
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
VACE multimodal meeting corpus
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A study on visual focus of attention recognition from head pose in a meeting room
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Multi-person tracking in meetings: a comparative study
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Modeling focus of attention for meeting indexing based on multiple cues
IEEE Transactions on Neural Networks
Automatic nonverbal analysis of social interaction in small groups: A review
Image and Vision Computing
Investigating the use of visual focus of attention for audio-visual speaker diarisation
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Predicting remote versus collocated group interactions using nonverbal cues
Proceedings of the ICMI-MLMI '09 Workshop on Multimodal Sensor-Based Systems and Mobile Phones for Social Computing
A speaker diarization method based on the probabilistic fusion of audio-visual location information
Proceedings of the 2009 international conference on Multimodal interfaces
Proceedings of the 2009 international conference on Multimodal interfaces
Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors
Proceedings of the 2009 international conference on Multimodal interfaces
IEEE Transactions on Audio, Speech, and Language Processing
Memory-based particle filter for tracking objects with large variation in pose and appearance
ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
HCII'11 Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Multimodal cue detection engine for orchestrated entertainment
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Multi-modal sensing and analysis of poster conversations toward smart posterboard
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Influence relation estimation based on lexical entrainment in conversation
Speech Communication
ARM-COMS: ARm-Supported embodied COmmunication monitor system
HCI'13 Proceedings of the 15th international conference on Human Interface and the Management of Information: information and interaction for learning, culture, collaboration and business - Volume Part III
Real-time audio-visual analysis for multiperson videoconferencing
Advances in Multimedia
Hi-index | 0.00 |
This paper presents a realtime system for analyzing group meetings that uses a novel omnidirectional camera-microphone system. The goal is to automatically discover the visual focus of attention (VFOA), i.e. "who is looking at whom", in addition to speaker diarization, i.e. "who is speaking and when". First, a novel tabletop sensing device for round-table meetings is presented; it consists of two cameras with two fisheye lenses and a triangular microphone array. Second, from high-resolution omnidirectional images captured with the cameras, the position and pose of people's faces are estimated by STCTracker (Sparse Template Condensation Tracker); it realizes realtime robust tracking of multiple faces by utilizing GPUs (Graphics Processing Units). The face position/pose data output by the face tracker is used to estimate the focus of attention in the group. Using the microphone array, robust speaker diarization is carried out by a VAD (Voice Activity Detection) and a DOA (Direction of Arrival) estimation followed by sound source clustering. This paper also presents new 3-D visualization schemes for meeting scenes and the results of an analysis. Using two PCs, one for vision and one for audio processing, the system runs at about 20 frames per second for 5-person meetings.