The computational perception of scene dynamics
Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
IEEE Transactions on Pattern Analysis and Machine Intelligence
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Subband-Based Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Learning visual behavior for gesture analysis
ISCV '95 Proceedings of the International Symposium on Computer Vision
Learning to Recognize Human Action Sequences
ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
The "Inverse hollywood problem": from video to scripts and storyboards via causal analysis
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A multimodal learning interface for grounding spoken language in sensory perceptions
ACM Transactions on Applied Perception (TAP)
Hi-index | 0.00 |
Action recognition has traditionally focused on processing fixed camera observations while ignoring non-visual information. In this paper, we explore the dynamic properties of the movements of different body parts in natural tasks: eye, head and hand movements are quite tightly coupled with the ongoing task. In light of this, our method takes an agent-centered view and incorporates an extensive description of eye-head-hand coordination. With the ability to track the course of gaze and head movements, our approach uses gaze and head cues to detect agent-centered attention switches that can then be utilized to segment an action sequence into action units. Based on recognizing those action primitives, parallel hidden Markov models are applied to model and integrate the probabilistic sequences of the action units of different body parts. An experimental system is built for recognizing human behaviors in three natural tasks: "unscrewing a jar", "stapling a letter" and "pouring water", which demonstrates the effectiveness of the approach.