Morphological methods in image and signal processing
Morphological methods in image and signal processing
An improved automatic lipreading system to enhance speech recognition
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Fundamentals of digital image processing
Fundamentals of digital image processing
A beam tracing approach to acoustic modeling for interactive virtual environments
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
LAFTER: Lips and Face Real-Time Tracker
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Parametrized structure from motion for 3D adaptive feedback tracking of faces
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Robust face feature analysis for automatic speechreading and character animation
FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Performance Evaluation of People Tracking Systems
WACV '96 Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)
A Camera-Based System for Tracking People in Real Time
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
A Digital Processing System for Source Location and Sound Capture by Large Microphone Arrays
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Acoustic Source Location in a Three-Dimensional Space Using Crosspower Spectrum Phase
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Human Motion Analysis: A Review
NAM '97 Proceedings of the 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97)
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
A multi-modal approach for determining speaker location and focus
Proceedings of the 5th international conference on Multimodal interfaces
Augmented collaborative spaces
ETP '03 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence
Joint audio-visual tracking using particle filters
EURASIP Journal on Applied Signal Processing
Hi-index | 0.00 |
The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such interfaces. In particular, we demonstrate a system for audio-visual tracking, showing that such a system is more robust, more accurate, more compact, and yields more information than using a single modality for tracking. We also demonstrate how such a system can be used to find the talker among a group of individuals, and render 3D scenes to the user.