Action Recognition Using Mined Hierarchical Compound Features

Authors:
Andrew Gilbert;John Illingworth;Richard Bowden
Affiliations:
University of Surrey, Guildford;University of Surrey, Guildford;University of Surrey, Guildford
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2011

Citing 0
Cited 16

There is more than oneway to get out of a car: automatic mode finding for action recognition in the wild

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Learning the semantics of object-action relations by observation

International Journal of Robotics Research
Selective spatio-temporal interest points

Computer Vision and Image Understanding
Trajectory-Based modeling of human actions with motion reference points

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Visual code-sentences: a new video representation based on image descriptor sequences

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Exploring the similarities of neighboring spatiotemporal points for action pair matching

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Human action recognition with salient trajectories

Signal Processing
An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions

Computer Vision and Image Understanding
Exploring STIP-based models for recognizing human interactions in TV videos

Pattern Recognition Letters
Editor's Choice Article: Human activity recognition in videos using a single example

Image and Vision Computing
Classifying web videos using a global video descriptor

Machine Vision and Applications
Language-motivated approaches to action recognition

The Journal of Machine Learning Research
Robust action recognition using local motion and group sparsity

Pattern Recognition
Graph-based approach for human action recognition using spatio-temporal features

Journal of Visual Communication and Image Representation
Weighted feature trajectories and concatenated bag-of-features for action recognition

Neurocomputing
Activity representation with motion hierarchies

International Journal of Computer Vision

Quantified Score

Hi-index	0.14

Visualization

Abstract

The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical approach outperforms all other methods reported thus far in the literature and can achieve real-time operation.