Recognizing Action at a Distance
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor
CRV '06 Proceedings of the The 3rd Canadian Conference on Computer and Robot Vision
Movie/Script: Alignment and Parsing of Video and Text Transcription
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Taking the bite out of automated naming of characters in TV video
Image and Vision Computing
Activity detection and recognition of daily living events
Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
Detecting People Looking at Each Other in Videos
International Journal of Computer Vision
Max-Margin Early Event Detectors
International Journal of Computer Vision
Hi-index | 0.00 |
We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier. We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies. Results are presented on a number of real-world movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new Hollywood---Localization dataset.