Probabilistic integration of sparse audio-visual cues for identity tracking

Authors:
Keni Bernardin;Rainer Stiefelhagen;Alex Waibel
Affiliations:
Universität Karlsruhe, Karlsruhe, Germany;Universität Karlsruhe, Karlsruhe, Germany;Universität Karlsruhe, Karlsruhe, Germany
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 12
Cited 2

Multimodal people ID for a multimedia meeting browser

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Face Cataloger: Multi-Scale Imaging for Relating Identity to Location

AVSS '03 Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance
Towards reliable multimodal sensing in aware environments

Proceedings of the 2001 workshop on Perceptive user interfaces
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
Audio-visual multi-person tracking and identification for smart environments

Proceedings of the 15th international conference on Multimedia
The CLEAR 2007 Evaluation

Multimodal Technologies for Perception of Humans
Multi-level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking

Multimodal Technologies for Perception of Humans
Evaluating multiple object tracking performance: the CLEAR MOT metrics

Journal on Image and Video Processing - Regular
ISL person identification systems in the CLEAR evaluations

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Bridging the gaps between cameras

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

An embedded audio-visual tracking and speech purification system on a dual-core processor platform

Microprocessors & Microsystems
Multimodal identification and tracking in smart environments

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of smart environments, the ability to track and identify persons is a key factor, determining the scope and flexibility of analytical components or intelligent services that can be provided. While some amount of work has been done concerning the camera-based tracking of multiple users in a variety of scenarios, technologies for acoustic and visual identification, such as face or voice ID, are unfortunately still subjected to severe limitations when distantly placed sensors have to be used. Because of this, reliable cues for identification can be hard to obtain without user cooperation, especially when multiple users are involved. In this paper, we present a novel technique for the tracking and identification of multiple persons in a smart environment using distantly placed audio-visual sensors. The technique builds on the opportunistic integration of tracking as well as face and voice identification cues, gained from several cameras and microphones, whenever these cues can be captured with a sufficient degree of confidence. A probabilistic model is used to keep track of identified persons and update the belief in their identities whenever new observations can be made. The technique has been systematically evaluated on the CLEAR Interactive Seminar database, a large audio-visual corpus of realistic meeting scenarios captured in a variety of smart rooms.