Multimodal group action clustering in meetings

Authors:
Dong Zhang;Daniel Gatica-Perez;Samy Bengio;Iain McCowan;Guillaume Lathoud
Affiliations:
IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland
Venue:
Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks
Year:
2004

Citing 10
Cited 8

Fundamentals of speech recognition

Fundamentals of speech recognition
A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
The meeting project at ICSI

HLT '01 Proceedings of the first international conference on Human language technology research
Detection of agreement vs. disagreement in meetings: training with unlabeled data

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Detecting unusual activity in video

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Extracting information from multimedia meeting collections

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Detecting small group activities from multimodal observations

Applied Intelligence
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Learning situation models in a smart home

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Learning situation models for providing context-aware services

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: ambient interaction
Integral framework for acquiring and evolving situations in smart environments

Journal of Ambient Intelligence and Smart Environments
Extracting activities from multimodal observation

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Meeting modelling in the context of multimodal research

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of clustering multimodal group actions in meetings using a two-layer HMM framework. Meetings are structured as sequences of group actions. Our approach aims at creating one cluster for each group action, where the number of group actions and the action boundaries are unknown a priori. In our framework, the first layer models typical actions of individuals in meetings using supervised HMM learning and low-level audio-visual features. A number of options that explicitly model certain aspects of the data (e.g., asynchrony) were considered. The second layer models the group actions using unsupervised HMM learning. The two layers are linked by a set of probability-based features produced by the individual action layer as input to the group action layer. The methodology was assessed on a set of multimodal turn-taking group actions, using a public five-hour meeting corpus. The results show that the use of multiple modalities and the layered framework are advantageous, compared to various baseline methods.