Pose and gaze estimation in multi-camera networks for non-restrictive HCI

Authors:
Chung-Ching Chang;Chen Wu;Hamid Aghajan
Affiliations:
Wireless Sensor Networks Lab, Stanford University, Stanford, CA;Wireless Sensor Networks Lab, Stanford University, Stanford, CA;Wireless Sensor Networks Lab, Stanford University, Stanford, CA
Venue:
HCI'07 Proceedings of the 2007 IEEE international conference on Human-computer interaction
Year:
2007

Citing 4
Cited 2

Human Body Model Acquisition and Tracking Using Voxel Data

International Journal of Computer Vision
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Shape-From-Silhouette Across Time Part II: Applications to Human Modeling and Markerless Motion Tracking

International Journal of Computer Vision
Predicting 3d people from 2d pictures

AMDO'06 Proceedings of the 4th international conference on Articulated Motion and Deformable Objects

Human-computer intelligent interaction: a survey

HCI'07 Proceedings of the 2007 IEEE international conference on Human-computer interaction
Cluster-based distributed face tracking in camera networks

IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Multi-camera networks offer potentials for a variety of novel human-centric applications through provisioning of rich visual information. In this paper, face orientation analysis and posture analysis are combined as components of a human-centered interface systemthat allows the user's intentions and region of interest to be estimated without requiring carried or wearable sensors. In pose estimation, image observations at the cameras are first locally reduced to parametrical descriptions, and Particle Swarm Optimization (PSO) is then used for optimization of the kinematics chain of the 3D human model. In face analysis, a discrete-time linear dynamical system (LDS), based on kinematics of the head, combines the local estimates of the user's gaze angle produced by the cameras and employs spatiotemporal filters to correct any inconsistencies. Knowing the intention and the region of interest of the user facilitates further interpretation of human behavior, which is the key to non-restrictive and intuitive human-centered interfaces. Applications in assisted living, speaker tracking, and gaming can benefit from such unobtrusive interfaces.