View-Independent Action Recognition from Temporal Self-Similarities

Authors:
Imran N. Junejo;Emilie Dexter;Ivan Laptev;Patrick Perez
Affiliations:
University of Sharjah, Sharjah, UAE;INRIA Rennes-Bretagne Atlantique, Universitaire de Beaulieu, France;INRIA Paris-Rocquencourt/ENS, Paris;Thompson R&D, Cesson-Séévigné
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2011

Citing 0
Cited 28

Behavior and properties of spatio-temporal local features under visual transformations

Proceedings of the international conference on Multimedia
Approximate reasoning and finite state machines to the detection of actions in video sequences

International Journal of Approximate Reasoning
Posing to the camera: automatic viewpoint selection for human actions

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part IV
Person re-identification based on global color context

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Systematic evaluation of spatio-temporal features on comparative video challenges

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Exploring alternative spatial and temporal dense representations for action recognition

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
Human action recognition using multiple views: a comparative perspective on recent developments

J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Spatiotemporal analysis of human activities for biometric authentication

Computer Vision and Image Understanding
Viewpoint Selection for Human Actions

International Journal of Computer Vision
Using SAX representation for human action recognition

Journal of Visual Communication and Image Representation
Intelligent multi-camera video surveillance: A review

Pattern Recognition Letters
A hardware friendly algorithm for action recognition using spatio-temporal motion-field patches

Neurocomputing
Real-time dance pattern recognition invariant to anthropometric and temporal differences

ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Action recognition using subtensor constraint

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
View-Invariant action recognition using latent kernelized structural SVM

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Action recognition with exemplar based 2.5d graph matching

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Recognizing Interactive Group Activities Using Temporal Interaction Matrices and Their Riemannian Statistics

International Journal of Computer Vision
Cross-View action recognition based on statistical machine translation

CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Transfer discriminant-analysis of canonical correlations for view-transfer action recognition

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Action segmentation in dance videos

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Human action recognition employing negative space features

Journal of Visual Communication and Image Representation
View invariant action recognition using weighted fundamental ratios

Computer Vision and Image Understanding
Exploring trace transform for robust human action recognition

Pattern Recognition
Synchronization of user-generated videos through trajectory correspondence and a refinement procedure

Proceedings of the 10th European Conference on Visual Media Production
Dynamic action recognition based on dynemes and Extreme Learning Machine

Pattern Recognition Letters
Temporal segmentation and assignment of successive actions in a long-term video

Pattern Recognition Letters
Robust human action recognition scheme based on high-level feature fusion

Multimedia Tools and Applications
Recognizing activities in multiple views with fusion of frame judgments

Image and Vision Computing

Quantified Score

Hi-index	0.14

Visualization

Abstract

This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to the current work, temporal ordering of local self-similarity descriptors can simply be ignored within a bag-of-features type of approach. Sufficient action discrimination is still retained in this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multiview correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public data sets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions, such as when recognizing actions from top views while using side views only for training.