Human action recognition in video by fusion of structural and spatio-temporal features

Authors:
Ehsan Zare Borzeshi;Oscar Perez Concha;Massimo Piccardi
Affiliations:
School of Computing and Communications, Faculty of Engineering and IT, University of Technology, Sydney (UTS), Sydney, Australia;Centre for Health Informatics, Australian Institute of Health Innovation, University of New South Wales, Sydney (UNSW), Australia;School of Computing and Communications, Faculty of Engineering and IT, University of Technology, Sydney (UTS), Sydney, Australia
Venue:
SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Year:
2012

Citing 21
Cited 0

Properties of Embedding Methods for Similarity Searching in Metric Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
"GrabCut": interactive foreground extraction using iterated graph cuts

ACM SIGGRAPH 2004 Papers
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Pattern Vectors from Algebraic Graph Theory

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Space-Time Interest Points

International Journal of Computer Vision
The Representation and Matching of Pictorial Structures

IEEE Transactions on Computers
Clustering and Embedding Using Commute Times

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixtures of robust probabilistic principal component analyzers

Neurocomputing
Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
A survey of graph edit distance

Pattern Analysis & Applications
Graph classification by means of Lipschitz embedding

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Graph embedding in vector spaces by means of prototype selection

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Comparing evaluation protocols on the KTH dataset

HBU'10 Proceedings of the First international conference on Human behavior understanding
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
Recognizing and Localizing Individual Activities through Graph Matching

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
Automatic human action recognition in videos by graph embedding

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Local representations and in particular STIP descriptors have gained increasing popularity for action recognition. Yet, the main limitation of those approaches is that they do not capture the spatial relationships in the subject performing the action. This paper proposes a novel method based on the fusion of global spatial relationships provided by graph embedding and the local spatio-temporal information of STIP descriptors. Experiments on an action recognition dataset reported in the paper show that recognition accuracy can be significantly improved by combining the structural information with the spatio-temporal features.