Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Face Recognition Based on Fitting a 3D Morphable Model
IEEE Transactions on Pattern Analysis and Machine Intelligence
The CMU Pose, Illumination, and Expression Database
IEEE Transactions on Pattern Analysis and Machine Intelligence
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Identifying Individuals in Video by Combining "Generative" and Discriminative Head Models
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Synergistic Face Detection and Pose Estimation with Energy-Based Models
The Journal of Machine Learning Research
Head Pose Estimation in Computer Vision: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing visual focus of attention from head pose in natural meetings
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cascaded models for articulated pose estimation
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Detecting people using mutually consistent poselet activations
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Variations of a hough-voting action recognition system
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Person spotting: video shot retrieval for face sets
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Social interactions: A first-person perspective
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Action bank: A high-level representation of activity in video
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Discovering discriminative action parts from mid-level video representations
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Recognizing proxemics in personal photos
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Face detection, pose estimation, and landmark localization in the wild
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Structured Learning of Human Interactions in TV Shows
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human focused action localization in video
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
International Journal of Computer Vision
Hi-index | 0.00 |
The objective of this work is to determine if people are interacting in TV video by detecting whether they are looking at each other or not. We determine both the temporal period of the interaction and also spatially localize the relevant people. We make the following four contributions: (i) head detection with implicit coarse pose information (front, profile, back); (ii) continuous head pose estimation in unconstrained scenarios (TV video) using Gaussian process regression; (iii) propose and evaluate several methods for assessing whether and when pairs of people are looking at each other in a video shot; and (iv) introduce new ground truth annotation for this task, extending the TV human interactions dataset (Patron-Perez et al. 2010) The performance of the methods is evaluated on this dataset, which consists of 300 video clips extracted from TV shows. Despite the variety and difficulty of this video material, our best method obtains an average precision of 87.6 % in a fully automatic manner.