Understanding interactions and guiding visual surveillance by tracking attention

Authors:
Ian Reid;Ben Benfold;Alonso Patron;Eric Sommerlade
Affiliations:
Department of Engineering Science, University of Oxford, Oxford, UK;Department of Engineering Science, University of Oxford, Oxford, UK;Department of Engineering Science, University of Oxford, Oxford, UK;Department of Engineering Science, University of Oxford, Oxford, UK
Venue:
ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Year:
2010

Citing 7
Cited 2

Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Randomized Trees for Real-Time Keypoint Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Cutting-plane training of structural SVMs

Machine Learning
An iterative image registration technique with an application to stereo vision

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Estimating gaze direction from low-resolution faces in video

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Gaze estimation from low resolution images

PSIVT'06 Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology

Exploring STIP-based models for recognizing human interactions in TV videos

Pattern Recognition Letters
Human interaction categorization by using audio-visual cues

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The central tenet of this paper is that by determining where people are looking, other tasks involved with understanding and interrogating a scene are simplified. To this end we describe a fully automatic method to determine a person's attention based on real-time visual tracking of their head and a coarse classification of their head pose. We estimate the head pose, or coarse gaze, using randomised ferns with decision branches based on both histograms of gradient orientations and colour based features. We use the coarse gaze for three applications to demonstrate its value: (i) we show how by building static and temporally varying maps of areas where people look we are able to identify interesting regions; (ii) we show how by determining the gaze of people in the scene we can more effectively control a multi-camera surveillance system to acquire faces for identification; (iii) we show how by identifying where people are looking we can more effectively classify human interactions.