Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicographic Parsimony Pressure
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Synthesis of interest point detectors through genetic programming
Proceedings of the 8th annual conference on Genetic and evolutionary computation
An empirical evaluation of deep architectures on problems with many factors of variation
Proceedings of the 24th international conference on Machine learning
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
A genetic programming framework for content-based image retrieval
Pattern Recognition
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Genetic programming for image analysis
GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
A Field Guide to Genetic Programming
A Field Guide to Genetic Programming
CRV '12 Proceedings of the 2012 Ninth Conference on Computer and Robot Vision
Mining actionlet ensemble for action recognition with depth cameras
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Kinect and RGBD Images: Challenges and Applications
SIBGRAPI-T '12 Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials
LaRED: a large RGB-D extensible hand gesture dataset
Proceedings of the 5th ACM Multimedia Systems Conference
Human activity recognition using multi-features and multiple kernel learning
Pattern Recognition
Hi-index | 0.00 |
Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separately and heuristically fuse them, which would not fully exploit the complementarity of both data sources. In this paper, we introduce an adaptive learning methodology to automatically extract (holistic) spatio-temporal features, simultaneously fusing the RGB and depth information, from RGB-D video data for visual recognition tasks. We address this as an optimization problem using our proposed restricted graph-based genetic programming (RGGP) approach, in which a group of primitive 3D operators are first randomly assembled as graph-based combinations and then evolved generation by generation by evaluating on a set of RGB-D video samples. Finally the best-performed combination is selected as the (near-)optimal representation for a pre-defined task. The proposed method is systematically evaluated on a new hand gesture dataset, SKIG, that we collected ourselves and the public MSR Daily Activity 3D dataset, respectively. Extensive experimental results show that our approach leads to significant advantages compared with state-of-the-art hand-crafted and machine-learned features.