3D head pose and gaze tracking and their application to diverse multimodal tasks
Proceedings of the 15th ACM on International conference on multimodal interaction
Hi-index | 0.14 |
We propose a gaze sensing method using visual saliency maps that does not need explicit personal calibration. Our goal is to create a gaze estimator using only the eye images captured from a person watching a video clip. Our method treats the saliency maps of the video frames as the probability distributions of the gaze points. We aggregate the saliency maps based on the similarity in eye images to efficiently identify the gaze points from the saliency maps. We establish a mapping between the eye images to the gaze points by using Gaussian process regression. In addition, we use a feedback loop from the gaze estimator to refine the gaze probability maps to improve the accuracy of the gaze estimation. The experimental results show that the proposed method works well with different people and video clips and achieves a 3.5-degree accuracy, which is sufficient for estimating a user's attention on a display.