Human focused action localization in video

Authors:
Alexander Kläser;Marcin Marszałek;Cordelia Schmid;Andrew Zisserman
Affiliations:
INRIA Grenoble, LEAR, LJK, France;Engineering Science, University of Oxford, UK;INRIA Grenoble, LEAR, LJK, France;Engineering Science, University of Oxford, UK
Venue:
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Year:
2010

Citing 7
Cited 3

Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor

CRV '06 Proceedings of the The 3rd Canadian Conference on Computer and Robot Vision
Movie/Script: Alignment and Parsing of Video and Text Transcription

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Taking the bite out of automated naming of characters in TV video

Image and Vision Computing

Activity detection and recognition of daily living events

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
Detecting People Looking at Each Other in Videos

International Journal of Computer Vision
Max-Margin Early Event Detectors

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier. We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies. Results are presented on a number of real-world movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new Hollywood---Localization dataset.