Visual and linguistic information in gesture classification

Authors:
Jacob Eisenstein;Randall Davis
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
ACM SIGGRAPH 2006 Courses
Year:
2006

Citing 11
Cited 0

Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
BEAT: the Behavior Expression Animation Toolkit

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Toward Natural Gesture/Speech Control of a Large Display

EHCI '01 Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Hand motion gestural oscillations and multimodal discourse

Proceedings of the 5th international conference on Multimodal interfaces
Converting text into agent animations: assigning gestures to text

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.