Combining Densely Sampled Form and Motion for Human Action Recognition

Authors:
Konrad Schindler;Luc Gool
Affiliations:
BIWI / ETH Zürich, Zürich, Switzerland CH-8092;BIWI / ETH Zürich, Zürich, Switzerland CH-8092 and ESAT / KU Leuven, Heverlee, Belgium B-3001
Venue:
Proceedings of the 30th DAGM symposium on Pattern Recognition
Year:
2008

Citing 10
Cited 1

Parameterized modeling and recognition of activities

Computer Vision and Image Understanding
View-Invariant Representation and Recognition of Actions

International Journal of Computer Vision
Kernel-Based Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Object Recognition with Features Inspired by Visual Cortex

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
A duality based approach for realtime TV-L1 optical flow

Proceedings of the 29th DAGM conference on Pattern recognition

Learning features for human action recognition using multilayer architectures

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for human action recognition from video, which exploits both form (local shape) and motion (local flow). Inspired by models of the human visual system, the two feature sets are processed independently in separate channels. The form channel extracts a dense local shape representation from every frame, while the motion channel extracts dense optic flow from the frame and its immediate predecessor. The same processing pipeline is applied in both channels: feature maps are pooled locally, down-sampled, and compared to a collection of learnt templates, yielding a vector of similarity scores. In a final step, the two score vectors are merged, and recognition is performed with a discriminative classifier. In an evaluation on two standard datasets our method outperforms the state-of-the-art, confirming that the combination of form and motion improves recognition.