HMDB: A large video database for human motion recognition

Authors:
H. Kuehne;H. Jhuang;E. Garrote;T. Poggio;T. Serre
Affiliations:
Karlsruhe Instit. of Tech., Germany;Massachusetts Institute of Technology, Cambridge, 02139, USA;Massachusetts Institute of Technology, Cambridge, 02139, USA;Massachusetts Institute of Technology, Cambridge, 02139, USA;Brown University, Providence, RI 02906, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 24

Instructing people for training gestural interactive systems

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Trajectory-Based modeling of human actions with motion reference points

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Motion interchange patterns for action recognition in unconstrained videos

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Recognizing complex events using large margin joint low-level event model

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classification

Signal Processing
Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition

Pattern Recognition
A survey of video datasets for human action and activity recognition

Computer Vision and Image Understanding
Egocentric activity monitoring and recovery

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
A comparative study of encoding, pooling and normalization methods for action recognition

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Exploring dense trajectory feature and encoding methods for human interaction recognition

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
We are not equally negative: fine-grained labeling for multimedia event detection

Proceedings of the 21st ACM international conference on Multimedia
Recognition of complex events in open-source web-scale videos: a bottom up approach

Proceedings of the 21st ACM international conference on Multimedia
Fall detection in multi-camera surveillance videos: experimentations and observations

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
A cognitive assistive system for monitoring the use of home medical devices

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI

Proceedings of the 15th ACM on International conference on multimodal interaction
Synchronization of user-generated videos through trajectory correspondence and a refinement procedure

Proceedings of the 10th European Conference on Visual Media Production
Common-sense reasoning for human action recognition

Pattern Recognition Letters
A local descriptor based on Laplacian pyramid coding for action recognition

Pattern Recognition Letters
Classifying web videos using a global video descriptor

Machine Vision and Applications
Automatic extraction of relevant video shots of specific actions exploiting Web data

Computer Vision and Image Understanding
Language-motivated approaches to action recognition

The Journal of Machine Learning Research
Activity representation with motion hierarchies

International Journal of Computer Vision
Enhancement of background subtraction techniques using a second derivative in gradient direction filter

Journal of Electrical and Computer Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.