Efficient human action detection using a transferable distance function

Authors:
Weilong Yang;Yang Wang;Greg Mori
Affiliations:
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada;School of Computing Science, Simon Fraser University, Burnaby, BC, Canada;School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Venue:
ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part II
Year:
2009

Citing 10
Cited 3

Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Robust Real-Time Face Detection

International Journal of Computer Vision
Space-Time Behavior Based Correlation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
One-Shot Learning of Object Categories

IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Learning to Locate Informative Features for Visual Identification

International Journal of Computer Vision
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision

Fusing appearance and distribution information of interest points for action recognition

Pattern Recognition
STV-based video feature processing for action recognition

Signal Processing
An adaptation framework for head-pose classification in dynamic multi-view scenarios

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of efficient human action detection with only one template. We choose the standard sliding-window approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge learnt from previous training sets, we weight the patches on the template video, by a transferable distance function. Based on the patch weighting, we propose a cascade structure which can efficiently scan the template video over test videos. Our method is evaluated on a human action dataset with cluttered background, and a ballet video with complex human actions. The experimental results show that our cascade structure not only achieves very reliable detection, but also can significantly improve the efficiency of patch-based human action detection, with an order of magnitude improvement in efficiency.