Detecting People Looking at Each Other in Videos

Authors:
M. J. Marin-Jimenez;A. Zisserman;M. Eichner;V. Ferrari
Affiliations:
Department of Computing and Numerical Analysis, Maimonides Institute for Biomedical Research (IMIBIC), University of Cordoba, Cordoba, Spain;Department of Engineering Science, University of Oxford, Oxford, UK;ETH Zurich, Zurich, Switzerland;School of Informatics, University of Edinburgh, Edinburgh, UK
Venue:
International Journal of Computer Vision
Year:
2014

Citing 23
Cited 1

Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
Face Recognition Based on Fitting a 3D Morphable Model

IEEE Transactions on Pattern Analysis and Machine Intelligence
The CMU Pose, Illumination, and Expression Database

IEEE Transactions on Pattern Analysis and Machine Intelligence
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Identifying Individuals in Video by Combining "Generative" and Discriminative Head Models

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Probabilistic Boosting-Tree: Learning Discriminative Models for Classification, Recognition, and Clustering

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Synergistic Face Detection and Pose Estimation with Energy-Based Models

The Journal of Machine Learning Research
Head Pose Estimation in Computer Vision: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing visual focus of attention from head pose in natural meetings

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cascaded models for articulated pose estimation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Detecting people using mutually consistent poselet activations

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Variations of a hough-voting action recognition system

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Person spotting: video shot retrieval for face sets

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Social interactions: A first-person perspective

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Action bank: A high-level representation of activity in video

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Discovering discriminative action parts from mid-level video representations

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Recognizing proxemics in personal photos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Face detection, pose estimation, and landmark localization in the wild

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Structured Learning of Human Interactions in TV Shows

IEEE Transactions on Pattern Analysis and Machine Intelligence
Human focused action localization in video

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I

Objects, Actions, Places

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of this work is to determine if people are interacting in TV video by detecting whether they are looking at each other or not. We determine both the temporal period of the interaction and also spatially localize the relevant people. We make the following four contributions: (i) head detection with implicit coarse pose information (front, profile, back); (ii) continuous head pose estimation in unconstrained scenarios (TV video) using Gaussian process regression; (iii) propose and evaluate several methods for assessing whether and when pairs of people are looking at each other in a video shot; and (iv) introduce new ground truth annotation for this task, extending the TV human interactions dataset (Patron-Perez et al. 2010) The performance of the methods is evaluated on this dataset, which consists of 300 video clips extracted from TV shows. Despite the variety and difficulty of this video material, our best method obtains an average precision of 87.6 % in a fully automatic manner.