Action bank: A high-level representation of activity in video

Authors:
Jason J. Corso
Affiliations:
Computer Science and Engineering, SUNY at Buffalo
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 23

Human activities as stochastic kronecker graphs

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Motion interchange patterns for action recognition in unconstrained videos

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Classifier ensemble recommendation

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Atomic action features: a new feature for action recognition

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Unsupervised temporal commonality discovery

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
A survey of video datasets for human action and activity recognition

Computer Vision and Image Understanding
Recommendations for video event recognition using concept vocabularies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Searching informative concept banks for video event detection

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Scalable crowd-sourcing of video from mobile devices

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
Spatiotemporal salience via centre-surround comparison of visual spacetime orientations

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
A comparative study of encoding, pooling and normalization methods for action recognition

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Exploring discriminative pose sub-patterns for effective action classification

Proceedings of the 21st ACM international conference on Multimedia
Time matters!: capturing variation in time in video using fisher kernels

Proceedings of the 21st ACM international conference on Multimedia
Human action recognition with salient trajectories

Signal Processing
Synchronization of user-generated videos through trajectory correspondence and a refinement procedure

Proceedings of the 10th European Conference on Visual Media Production
Violent scene detection using mid-level feature

Proceedings of the Fourth Symposium on Information and Communication Technology
Editor's Choice Article: Human activity recognition in videos using a single example

Image and Vision Computing
Matching mixtures of curves for human action recognition

Computer Vision and Image Understanding
Evaluating multimedia features and fusion for example-based event detection

Machine Vision and Applications
Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

Machine Vision and Applications
Detecting People Looking at Each Other in Videos

International Journal of Computer Vision
Activity representation with motion hierarchies

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Activity recognition in video is dominated by low- and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. Inspired by the recent object bank approach to image representation, we present Action Bank, a new high-level representation of video. Action bank is comprised of many individual action detectors sampled broadly in semantic space as well as viewpoint space. Our representation is constructed to be semantically rich and even when paired with simple linear SVM classifiers is capable of highly discriminative performance. We have tested action bank on four major activity recognition benchmarks. In all cases, our performance is better than the state of the art, namely 98.2% on KTH (better by 3.3%), 95.0% on UCF Sports (better by 3.7%), 57.9% on UCF50 (baseline is 47.9%), and 26.9% on HMDB51 (baseline is 23.2%). Furthermore, when we analyze the classifiers, we find strong transfer of semantics from the constituent action detectors to the bank classifier.