Continuous body and hand gesture recognition for natural human-computer interaction

Authors:
Yale Song;David Demirdjian;Randall Davis
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Year:
2012

Citing 27
Cited 5

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
The nature of statistical learning theory

The nature of statistical learning theory
CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computer Vision for Interactive Computer Graphics

IEEE Computer Graphics and Applications
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shadow Puppetry

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Fast Pose Estimation with Parameter-Sensitive Hashing

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Model-Based Approach for Estimating Human 3D Poses in Static Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recovering 3D Human Body Configurations Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Vision-based human motion analysis: An overview

Computer Vision and Image Understanding
Vision-based hand pose estimation: A review

Computer Vision and Image Understanding
Superquadrics and Angle-Preserving Transformations

IEEE Computer Graphics and Applications
Hidden Conditional Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-time foreground-background segmentation using codebook model

Real-Time Imaging
Toward natural interaction in the real world: real-time gesture recognition

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
A survey of vision-based methods for action representation, segmentation and recognition

Computer Vision and Image Understanding
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Kinematic jump processes for monocular 3D human tracking

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Virtual rapport 2.0

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Real-time human pose recognition in parts from single depth images

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Gesture Recognition: A Survey

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Real-time classification of dynamic hand gestures from marker-based position data

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Design of an efficient framework for fast prototyping of customized human-computer interfaces and virtual environments for rehabilitation

Computer Methods and Programs in Biomedicine
Online human gesture recognition from motion data streams

Proceedings of the 21st ACM international conference on Multimedia
On designing interactivity awareness for ambient displays

Multimedia Tools and Applications
Online RGB-D gesture recognition with extreme learning machines

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intelligent gesture recognition systems open a new era of natural human-computer interaction: Gesturing is instinctive and a skill we all have, so it requires little or no thought, leaving the focus on the task itself, as it should be, not on the interaction modality. We present a new approach to gesture recognition that attends to both body and hands, and interprets gestures continuously from an unsegmented and unbounded input stream. This article describes the whole procedure of continuous body and hand gesture recognition, from the signal acquisition to processing, to the interpretation of the processed signals. Our system takes a vision-based approach, tracking body and hands using a single stereo camera. Body postures are reconstructed in 3D space using a generative model-based approach with a particle filter, combining both static and dynamic attributes of motion as the input feature to make tracking robust to self-occlusion. The reconstructed body postures guide searching for hands. Hand shapes are classified into one of several canonical hand shapes using an appearance-based approach with a multiclass support vector machine. Finally, the extracted body and hand features are combined and used as the input feature for gesture recognition. We consider our task as an online sequence labeling and segmentation problem. A latent-dynamic conditional random field is used with a temporal sliding window to perform the task continuously. We augment this with a novel technique called multilayered filtering, which performs filtering both on the input layer and the prediction layer. Filtering on the input layer allows capturing long-range temporal dependencies and reducing input signal noise; filtering on the prediction layer allows taking weighted votes of multiple overlapping prediction results as well as reducing estimation noise. We tested our system in a scenario of real-world gestural interaction using the NATOPS dataset, an official vocabulary of aircraft handling gestures. Our experimental results show that: (1) the use of both static and dynamic attributes of motion in body tracking allows statistically significant improvement of the recognition performance over using static attributes of motion alone; and (2) the multilayered filtering statistically significantly improves recognition performance over the nonfiltering method. We also show that, on a set of twenty-four NATOPS gestures, our system achieves a recognition accuracy of 75.37%.