Audio-visual tracking for natural interactivity

Authors:
Gopal Pingali;Gamze Tunali;Ingrid Carlbom
Affiliations:
Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ;Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ;Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ
Venue:
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Year:
1999

Citing 12
Cited 4

Morphological methods in image and signal processing

Morphological methods in image and signal processing
An improved automatic lipreading system to enhance speech recognition

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Fundamentals of digital image processing

Fundamentals of digital image processing
A beam tracing approach to acoustic modeling for interactive virtual environments

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
LAFTER: Lips and Face Real-Time Tracker

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Parametrized structure from motion for 3D adaptive feedback tracking of faces

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Robust face feature analysis for automatic speechreading and character animation

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Performance Evaluation of People Tracking Systems

WACV '96 Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)
A Camera-Based System for Tracking People in Real Time

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
A Digital Processing System for Source Location and Sound Capture by Large Microphone Arrays

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Acoustic Source Location in a Three-Dimensional Space Using Crosspower Spectrum Phase

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Human Motion Analysis: A Review

NAM '97 Proceedings of the 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97)

A Graphical Model for Audiovisual Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
Augmented collaborative spaces

ETP '03 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence
Joint audio-visual tracking using particle filters

EURASIP Journal on Applied Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such interfaces. In particular, we demonstrate a system for audio-visual tracking, showing that such a system is more robust, more accurate, more compact, and yields more information than using a single modality for tracking. We also demonstrate how such a system can be used to find the talker among a group of individuals, and render 3D scenes to the user.