Learning discriminative representations from RGB-D video data

Authors:
Li Liu;Ling Shao
Affiliations:
Department of Electronic and Electrical Engineering, University of Sheffield, UK;Department of Electronic and Electrical Engineering, University of Sheffield, UK
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 13
Cited 2

Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Synthesis of interest point detectors through genetic programming

Proceedings of the 8th annual conference on Genetic and evolutionary computation
An empirical evaluation of deep architectures on problems with many factors of variation

Proceedings of the 24th international conference on Machine learning
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Speeded-Up Robust Features (SURF)

Computer Vision and Image Understanding
A genetic programming framework for content-based image retrieval

Pattern Recognition
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Genetic programming for image analysis

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
Adaptive RGB-D Localization

CRV '12 Proceedings of the 2012 Ninth Conference on Computer and Robot Vision
Mining actionlet ensemble for action recognition with depth cameras

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Kinect and RGBD Images: Challenges and Applications

SIBGRAPI-T '12 Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials

LaRED: a large RGB-D extensible hand gesture dataset

Proceedings of the 5th ACM Multimedia Systems Conference
Human activity recognition using multi-features and multiple kernel learning

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separately and heuristically fuse them, which would not fully exploit the complementarity of both data sources. In this paper, we introduce an adaptive learning methodology to automatically extract (holistic) spatio-temporal features, simultaneously fusing the RGB and depth information, from RGB-D video data for visual recognition tasks. We address this as an optimization problem using our proposed restricted graph-based genetic programming (RGGP) approach, in which a group of primitive 3D operators are first randomly assembled as graph-based combinations and then evolved generation by generation by evaluating on a set of RGB-D video samples. Finally the best-performed combination is selected as the (near-)optimal representation for a pre-defined task. The proposed method is systematically evaluated on a new hand gesture dataset, SKIG, that we collected ourselves and the public MSR Daily Activity 3D dataset, respectively. Extensive experimental results show that our approach leads to significant advantages compared with state-of-the-art hand-crafted and machine-learned features.