Keep it simple and sparse: real-time action recognition

Authors:
Sean Ryan Fanello;Ilaria Gori;Giorgio Metta;Francesca Odone
Affiliations:
iCub Facility, Istituto Italiano di Tecnologia, Genova, Italia;iCub Facility, Istituto Italiano di Tecnologia, Genova, Italia;iCub Facility, Istituto Italiano di Tecnologia, Genova, Italia;Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi, Università degli Studi di Genova, Genova, Italia
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 29
Cited 0

The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Robust Real-Time Face Detection

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Highly Accurate Optic Flow Computation with Theoretically Justified Warping

International Journal of Computer Vision
Continuous Human Action Segmentation and Recognition Using a Spatio-Temporal Probabilistic Framework

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Coupled Hidden Semi Markov Models for Activity Recognition

WMVC '07 Proceedings of the IEEE Workshop on Motion and Video Computing
Matrix comparison, Part 1: Motivation and important issues for measuring the resemblance between proximity measures or ordination results

Journal of the American Society for Information Science and Technology
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stereo Processing by Semiglobal Matching and Mutual Information

IEEE Transactions on Pattern Analysis and Machine Intelligence
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A sparsity-enforcing method for learning face features

IEEE Transactions on Image Processing
A survey on vision-based human action recognition

Image and Vision Computing
Two-frame motion estimation based on polynomial expansion

SCIA'03 Proceedings of the 13th Scandinavian conference on Image analysis
The iCub humanoid robot: an open platform for research in embodied cognition

PerMIS '08 Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems
Human activity analysis: A review

ACM Computing Surveys (CSUR)
Arm-hand behaviours modelling: from attention to imitation

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Stereoscopic Scene Flow Computation for 3D Motion Understanding

International Journal of Computer Vision
Scene flow estimation by growing correspondence seeds

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Geometric $/ell$_p-norm feature pooling for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Real-time human pose recognition in parts from single depth images

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

IEEE Transactions on Image Processing
Sparse Representation for Color Image Restoration

IEEE Transactions on Image Processing
Robust 3d action recognition with random occupancy patterns

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
A conditional random field-based model for joint sequence segmentation and classification

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efficient way. The proposed method relies on descriptors based on 3D Histograms of Scene Flow (3DHOFs) and Global Histograms of Oriented Gradient (GHOGs); adaptive sparse coding is applied to capture high-level patterns from data. We then propose a simultaneous on-line video segmentation and recognition of actions using linear SVMs. The main contribution of the paper is an effective real-time system for one-shot action modeling and recognition; the paper highlights the effectiveness of sparse coding techniques to represent 3D actions. We obtain very good results on three different data sets: a benchmark data set for one-shot action learning (the ChaLearn Gesture Data Set), an in-house data set acquired by a Kinect sensor including complex actions and gestures differing by small details, and a data set created for human-robot interaction purposes. Finally we demonstrate that our system is effective also in a human-robot interaction setting and propose a memory game, "All Gestures You Can", to be played against a humanoid robot.