Margin based feature selection - theory and algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Hybrid Models for Human Motion Recognition
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Pose-Invariant Descriptor for Human Detection and Segmentation
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Extracting Moving People from Internet Videos
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Class consistent k-means: Application to face and action recognition
Computer Vision and Image Understanding
Combining skeletal pose with local motion for human activity recognition
AMDO'12 Proceedings of the 7th international conference on Articulated Motion and Deformable Objects
Human posture recognition with a time-of-flight 3D sensor for in-home applications
Expert Systems with Applications: An International Journal
Human action recognition and retrieval using sole depth information
Proceedings of the 20th ACM international conference on Multimedia
Discriminative prototype selection methods for graph embedding
Pattern Recognition
Action recognition using linear dynamic systems
Pattern Recognition
The co-attention model for tiny activity analysis
Neurocomputing
Graph-based approach for human action recognition using spatio-temporal features
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
This paper presents an exemplar-based approach to detecting and localizing human actions, such as running, cycling, and swinging, in realistic videos with dynamic backgrounds. We show that such activities can be compactly represented as time series of a few snapshots of human-body parts in their most discriminative postures, relative to other activity classes. This enables our approach to efficiently store multiple diverse exemplars per activity class, and quickly retrieve exemplars that best match the query by aligning their short time-series representations. Given a set of example videos of all activity classes, we extract multiscale regions from all their frames, and then learn a sparse dictionary of most discriminative regions. The Viterbi algorithm is then used to track detections of the learned codewords across frames of each video, resulting in their compact time-series representations. Dictionary learning is cast within the largemargin framework, wherein we study the effects of l1 and l2 regularization on the sparseness of the resulting dictionaries. Our experiments demonstrate robustness and scalability of our approach on challenging YouTube videos.