ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Robust Real-Time Face Detection
International Journal of Computer Vision
User-oriented document summarization through vision-based eye-tracking
Proceedings of the 14th international conference on Intelligent user interfaces
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Analysis environment of conversational structure with nonverbal multimodal data
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Real time head pose estimation from consumer depth cameras
DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
VACE multimodal meeting corpus
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Real-time human pose recognition in parts from single depth images
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Topology Dictionary for 3D Video Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence
Group dynamics and multimodal interaction modeling using a smart digital signage
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Hi-index | 0.00 |
In this paper, we present a novel multimodal system designed for smooth multi-party human-machine interaction. HCI for multiple users is challenging because simultaneous actions and reactions have to be consistent. Here, the proposed system consists of a digital signage or large display equipped with multiple sensing devices: a 19-channel microphone array, 6 HD video cameras (3 are placed on top and 3 on the bottom of the display), and two depth sensors. The display can show various contents, similar to a poster presentation, or multiple windows (e.g., web browsers, photos, etc.). On the other hand, multiple users positioned in front of the panel can freely interact using voice or gesture while looking at the displayed contents, without wearing any particular device (such as motion capture sensors or head mounted devices). Acoustic and visual information processing are performed jointly using state-of-the-art techniques to obtain individual speech and gaze direction. Hence displayed contents can be adapted to users' interests.