Group dynamics and multimodal interaction modeling using a smart digital signage

Authors:
Tony Tung;Randy Gomez;Tatsuya Kawahara;Takashi Matsuyama
Affiliations:
Academic Center for Computing and Media Studies, and Graduate School of Informatics, Kyoto University, Japan;Academic Center for Computing and Media Studies, and Graduate School of Informatics, Kyoto University, Japan;Academic Center for Computing and Media Studies, and Graduate School of Informatics, Kyoto University, Japan;Academic Center for Computing and Media Studies, and Graduate School of Informatics, Kyoto University, Japan
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Year:
2012

Citing 7
Cited 1

BEAT: the Behavior Expression Animation Toolkit

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Dynamic Textures

International Journal of Computer Vision
Color-Based Probabilistic Tracking

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Robust Real-Time Face Detection

International Journal of Computer Vision
Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
An interaction-embedded HMM framework for human behavior understanding: with nursing environments as examples

IEEE Transactions on Information Technology in Biomedicine
Analysis environment of conversational structure with nonverbal multimodal data

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

Multi-party human-machine interaction using a smart multimodal digital signage

HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new multimodal system for group dynamics and interaction analysis. The framework is composed of a mic array and multiview video cameras placed on a digital signage display which serves as a support for interaction. We show that visual information processing can be used to localize nonverbal communication events and synchronized with audio information. Our contribution is twofold: 1) we present a scalable portable system for multiple people multimodal interaction sensing, and 2) we propose a general framework to model A/V multimodal interaction that employs speaker diarization for audio processing and hybrid dynamical systems (HDS) for video processing. HDS are used to represent communication dynamics between multiple people by capturing the characteristics of temporal structures in head motions. Experimental results show real-world situations of group communication processing for joint attention estimation. We believe the proposed framework is very promising for further research.