Untethered gesture acquisition and recognition for a multimodal conversational system

Authors:
T. Ko;D. Demirdjian;T. Darrell
Affiliations:
CSAIL, MIT, Cambridge, MA;CSAIL, MIT, Cambridge, MA;CSAIL, MIT, Cambridge, MA
Venue:
Proceedings of the 5th international conference on Multimodal interfaces
Year:
2003

Citing 7
Cited 1

Parametric Hidden Markov Models for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition of Visual Activities and Interactions by Stochastic Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents

Embodied conversational agents
3-D Articulated Pose Tracking for Untethered Diectic Reference

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Map-Based System Using Speech and 3D Gestures for Pervasive Computing

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces

Multi-human dialogue understanding for assisting artifact-producing meetings

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Humans use a combination of gesture and speech to convey meaning, and usually do so without holding a device or pointer. We present a system that incorporates body tracking and gesture recognition for an untethered human-computer interface. This research focuses on a module that provides parameterized gesture recognition, using various machine learning techniques. We train the support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models on specific gestures. Given a sequence, we can find the start and end of various gestures using a support vector classifier, and find gesture likelihoods and parameters with a HMM. Finally multimodal recognition is performed using rank-order fusion to merge speech and vision hypotheses.