Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
IEEE Transactions on Pattern Analysis and Machine Intelligence
Spatial-Temporal correlatons for unsupervised action classification
WMVC '08 Proceedings of the 2008 IEEE Workshop on Motion and video Computing
An overview of contest on semantic description of human activities (SDHA) 2010
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Variations of a hough-voting action recognition system
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Probabilistic group-level motion analysis and scenario recognition
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A "string of feature graphs" model for recognition of complex activities in natural videos
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning spatiotemporal graphs of human activities
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning latent spatio-temporal compositional model for human action recognition
Proceedings of the 21st ACM international conference on Multimedia
Editor's Choice Article: Human activity recognition in videos using a single example
Image and Vision Computing
Hi-index | 0.00 |
The local feature based approaches have become popular for activity recognition. A local feature captures the local movement and appearance of a local region in a video, and thus can be ambiguous; e.g., it cannot tell whether a movement is from a person's hand or foot, when the camera is far away from the person. To better distinguish different types of activities, people have proposed using the combination of local features to encode the relationships of local movements. Due to the computation limit, previous work only creates a combination from neighboring features in space and/or time. In this paper, we propose an approach that efficiently identifies both local and long-range motion interactions; taking the "push" activity as an example, our approach can capture the combination of the hand movement of one person and the foot response of another person, the local features of which are both spatially and temporally far away from each other. Our computational complexity is in linear time to the number of local features in a video. The extensive experiments show that our approach is generically effective for recognizing a wide variety of activities and activities spanning a long term, compared to a number of state-of-the-art methods.