Exploring probabilistic localized video representation for human action recognition
Multimedia Tools and Applications
Spatio-temporal SIFT and its application to human action classification
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Hi-index | 0.00 |
In this paper, we propose a systematic framework for action recognition in unconstrained amateur videos. Inspired by the success of local features used in object and pose recognition, we extract local static features from the sampled frames to capture local pose shape and appearance. In addition, we extract spatiotemporal features (ST features), which have been successfully used in action recognition, to capture the local motions. In the action recognition phase, we use the Pyramid Match Kernel based on weighted similarities of multi-resolution histograms to match two videos within the same feature types. In order to handle complementary but heterogeneous features, i.e., static and motion features, we chose a multi-kernel classifier for feature fusion. To reduce the noise introduced by the background clutter, our system also tries to automatically find the rough region of interest/action. Preliminary tests on the KTH action dataset, UCF sports dataset, and a YouTube action dataset have shown promising results.