Audio-visual multi-person tracking and identification for smart environments

Authors:
Keni Bernardin;Rainer Stiefelhagen
Affiliations:
Universität Karlsruhe, Karlsruhe, Germany;Universität Karlsruhe, Karlsruhe, Germany
Venue:
Proceedings of the 15th international conference on Multimedia
Year:
2007

Citing 11
Cited 12

Multimodal people ID for a multimedia meeting browser

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Surveillance System Combining Peripheral and Foveated Motion Tracking

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Face Cataloger: Multi-Scale Imaging for Relating Identity to Location

AVSS '03 Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance
Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Proceedings of the 5th international conference on Multimodal interfaces
Towards reliable multimodal sensing in aware environments

Proceedings of the 2001 workshop on Perceptive user interfaces
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
Multi- and single view multiperson tracking for smart room environments

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
ISL person identification systems in the CLEAR evaluations

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships

Biometrics Driven Smart Environments: Abstract Framework and Evaluation

UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
Visual Focus of Attention in Dynamic Meeting Scenarios

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Probabilistic integration of sparse audio-visual cues for identity tracking

MM '08 Proceedings of the 16th ACM international conference on Multimedia
A context-aware virtual secretary in a smart office environment

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Studying vision-based multiple-user interaction with in-home large displays

HCC '08 Proceedings of the 3rd ACM international workshop on Human-centered computing
Multi-modal and multi-camera attention in smart environments

Proceedings of the 2009 international conference on Multimodal interfaces
Blending games, multimedia and reality

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Multimodal cue detection engine for orchestrated entertainment

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
A survey on multi person identification and localization

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel system for the automatic and unobtrusive tracking and identification of multiple persons in an indoor environment. Information from several fixed cameras is fused in a particle filter framework to simultaneously track multiple occupants. A set of steerable fuzzy-controlled pan-tilt-zoom cameras serves to smoothly track persons of interest and opportunistically capture facial close-ups for face identification. In parallel, speech segmentation, sound source localization and speaker identification are performed using several far-field microphones and arrays. The information coming asynchronously and sporadically from several sources, such as track updates and spatio-temporally localized visual and acoustic identification cues, is fused at higher level to gradually refine the global scene model and increase the system's confidence in the set of recognized identities. The system has been trained on a small set of users' faces and/or voices and showed good performance in natural meeting scenarios at quickly acquiring their identities and complementing the ID information missing in single modalities.