Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

Authors:
Dong Zhang;Daniel Gatica-Perez;Samy Bengio;Iain McCowan;Guillaume Lathoud
Affiliations:
IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland
Venue:
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
Year:
2004

Citing 0
Cited 22

Multimodal group action clustering in meetings

Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Selective perception policies for guiding sensing and computation in multimodal systems: a comparative analysis

Computer Vision and Image Understanding - Special issue: Attention and performance in computer vision
Tracking the activity of participants in a meeting

Machine Vision and Applications
Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Conditional models for contextual human motion recognition

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Audiovisual integration with Segment Models for tennis video parsing

Computer Vision and Image Understanding
Motion intention recognition in robot assisted applications

Robotics and Autonomous Systems
Daily Routine Recognition through Activity Spotting

LoCA '09 Proceedings of the 4th International Symposium on Location and Context Awareness
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Selective perception policies for guiding sensing and computation in multimodal systems: A comparative analysis

Computer Vision and Image Understanding - Special issue: Attention and performance in computer vision
A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing
Smart meeting systems: A survey of state-of-the-art and open issues

ACM Computing Surveys (CSUR)
Sensor-Based Human Activity Recognition in a Multi-user Scenario

AmI '09 Proceedings of the European Conference on Ambient Intelligence
Graphical models for multi-modal automatic video editing in meetings

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
An interaction-embedded HMM framework for human behavior understanding: with nursing environments as examples

IEEE Transactions on Information Technology in Biomedicine
Recognizing multi-user activities using wearable sensors in a smart home

Pervasive and Mobile Computing
Multimodal integration for meeting group action segmentation and recognition

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Towards computer understanding of human interactions

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Multi channel sequence processing

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Using audio, visual, and lexical features in a multi-modal virtual meeting director

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of recognizing sequences of human interaction patterns in meetings, with the goal of structuring them in semantic terms. The investigated patterns are inherently group-based (defined by the individual activities of meeting participants, and their interplay), and multimodal (as captured by cameras and microphones). By defining a proper set of individual actions, group actions can be modeled as a two-layer process, one that models basic individual activities from low-level audio-visual features, and another one that models the interactions. We propose a two-layer Hidden Markov Model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works. First, by decomposing the problem hierarchically, learning is performed on low-dimensional observation spaces, which results in simpler models. Second, our framework is easier to interpret, as both individual and group actions have a clear meaning, and thus easier to improve. Third, different HMM models can be used in each layer, to better reflect the nature of each subproblem. Our framework is general and extensible, and we illustrate it with a set of eight group actions, using a public five-hour meeting corpus. Experiments and comparison with a single-layer HMM baseline system show its validity.