Two-handed gesture recognition and fusion with speech to command a robot

Authors:
B. Burger;I. Ferrané/;F. Lerasle;G. Infantes
Affiliations:
CNRS, LAAS, Toulouse Cedex, France 31077 and IRIT, Université/ de Toulouse, Toulouse Cedex, France 31062;IRIT, Université/ de Toulouse, Toulouse Cedex, France 31062 and Université/ de Toulouse, UPS, INSA, INP, ISAE/ UT1, UTM, LAAS, Toulouse Cedex, France 31077;CNRS, LAAS, Toulouse Cedex, France 31077 and Université/ de Toulouse, UPS, INSA, INP, ISAE/ UT1, UTM, LAAS, Toulouse Cedex, France 31077;Onera, Toulouse Cedex 4, France 31055
Venue:
Autonomous Robots
Year:
2012

Citing 20
Cited 2

CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
Probabilistic Data Association Methods for Tracking Complex Visual Objects

IEEE Transactions on Pattern Analysis and Machine Intelligence
A System for Person-Independent Hand Posture Recognition against Complex Backgrounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Gesture Based Interface for Human-Robot Interaction

Autonomous Robots
ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Camera-Based Gesture Recognition for Robot Control

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
Two-Handed Gesture Tracking Incorporating Template Warping with Static Segmentation

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Head and Hands 3D Tracking in Real Time by the EM Algorithm

RATFG-RTS '01 Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS'01)
Human-robot speech interface understanding inexplicit utterances using vision

CHI '04 Extended Abstracts on Human Factors in Computing Systems
A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Adaptive visual gesture recognition for human-robot interaction using a knowledge-based software platform

Robotics and Autonomous Systems
Visual recognition of pointing gestures for human-robot interaction

Image and Vision Computing
Vision-based hand pose estimation: A review

Computer Vision and Image Understanding
Distributed Bayesian multiple-target tracking in crowded environments using multiple collaborative cameras

EURASIP Journal on Applied Signal Processing
Head Pose Estimation in Computer Vision: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robot introspection through learned hidden Markov models

Artificial Intelligence
Tracking multiple humans in crowded environment

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
HMM-Based gesture recognition for robot control

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part I
Spatial language for human-robot dialogs

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls

IEEE Transactions on Neural Networks

A dialogue system for multimodal human-robot interaction

Proceedings of the 15th ACM on International conference on multimodal interaction
Visual estimation of pointed targets for robot guidance via fusion of face pose and hand orientation

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.