Discriminative space-time voting for joint recognition and localization of actions.

Authors:
Antonios Oikonomopoulos;Ioannis Patras;Maja Pantic
Affiliations:
Imperial College London, London, United Kingdom;Queen Mary University of London, London, United Kingdom;Imperial College London, London, United Kingdom and University of Twente, Netherlands
Venue:
Proceedings of the 2nd international workshop on Social signal processing
Year:
2010

Citing 10
Cited 0

Tracking and data association

Tracking and data association
Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Detecting Irregularities in Images and in Video

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
A survey on vision-based human action recognition

Image and Vision Computing
Human computing and machine understanding of human behavior: a survey

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of activity detection in unsegmented image sequences. Our main contribution is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic ensembles of feature descriptors. Evidence for the spatiotemporal localization of the activity is accumulated in a probabilistic spatiotemporal voting scheme. We use boosting in order to select characteristic ensembles per class. This leads to a set of class specific codebooks where each codeword is an ensemble of features. During training, we store the spatial positions of the codeword ensembles with respect to a set of reference points, and their temporal positions with respect to the start and end of the action instance. During testing, each activated codeword casts votes concerning the spatiotemporal position and extend of the action, using the information stored during training. Mean Shift mode estimation in the voting space provides the most probable hypotheses concerning the localization of the subjects at each frame, as well as the extend of the activities depicted in the image sequences. We present experimental results for a number of publicly available datasets, that demonstrate the efficiency of the proposed method in localizing and classifying human activities.