View-invariant gesture recognition using 3D optical flow and harmonic motion context

Authors:
M. B. Holte;T. B. Moeslund;P. Fihl
Affiliations:
Computer Vision and Media Technology Laboratory, Aalborg University, Niels Jernes Vej 14, DK-9220 Aalborg, Denmark;Computer Vision and Media Technology Laboratory, Aalborg University, Niels Jernes Vej 14, DK-9220 Aalborg, Denmark;Computer Vision and Media Technology Laboratory, Aalborg University, Niels Jernes Vej 14, DK-9220 Aalborg, Denmark
Venue:
Computer Vision and Image Understanding
Year:
2010

Citing 11
Cited 7

Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
aSpaces: Action Spaces for Recognition and Synthesis of Human Actions

AMDO '02 Proceedings of the Second International Workshop on Articulated Motion and Deformable Objects
Inference of Human Postures by Classification of 3D Human Body Shape

AMFG '03 Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Actions Sketch: A Novel Action Representation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
HMM-based Human Action Recognition Using Multiview Image Sequences

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Free viewpoint action recognition using motion history volumes

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Dynamic Bayesian networks for visual recognition of dynamic gestures

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - IBERAMIA '02
Human action recognition in table-top scenarios: an HMM-based analysis to optimize the performance

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Gesture Recognition: A Survey

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Editorial: Special issue on Time-of-Flight camera based computer vision

Computer Vision and Image Understanding
Spatiotemporal analysis of human activities for biometric authentication

Computer Vision and Image Understanding
Modeling and prediction of driver behavior by foot gesture analysis

Computer Vision and Image Understanding
Action recognition robust to background clutter by using stereo vision

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Design of an efficient framework for fast prototyping of customized human-computer interfaces and virtual environments for rehabilitation

Computer Methods and Programs in Biomedicine
Subject independent human action recognition using spatio-depth information and meta-cognitive RBF network

Engineering Applications of Artificial Intelligence
Human activity recognition using multi-features and multiple kernel learning

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an approach for view-invariant gesture recognition. The approach is based on 3D data captured by a SwissRanger SR4000 camera. This camera produces both a depth map as well as an intensity image of a scene. Since the two information types are aligned, we can use the intensity image to define a region of interest for the relevant 3D data. This data fusion improves the quality of the motion detection and hence results in better recognition. The gesture recognition is based on finding motion primitives (temporal instances) in the 3D data. Motion is detected by a 3D version of optical flow and results in velocity annotated point clouds. The 3D motion primitives are represented efficiently by introducing motion context. The motion context is transformed into a view-invariant representation using spherical harmonic basis functions, yielding a harmonic motion context representation. A probabilistic Edit Distance classifier is applied to identify which gesture best describes a string of primitives. The approach is trained on data from one viewpoint and tested on data from a very different viewpoint. The recognition rate is 94.4% which is similar to the recognition rate when training and testing on gestures from the same viewpoint, hence the approach is indeed view-invariant.