Visual and linguistic information in gesture classification

Authors:
Jacob Eisenstein;Randall Davis
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 12
Cited 9

Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
BEAT: the Behavior Expression Animation Toolkit

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Toward Natural Gesture/Speech Control of a Large Display

EHCI '01 Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Hand motion gestural oscillations and multimodal discourse

Proceedings of the 5th international conference on Multimodal interfaces
Introduction: a journal pivotal to the future of web intelligence and agent systems

Web Intelligence and Agent Systems
Converting text into agent animations: assigning gestures to text

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Semantic back-pointers from gesture

NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
Sketch understanding systems

ACM SIGGRAPH 2007 courses
Bibliography

ACM SIGGRAPH 2007 courses
How to build repeatable experiments

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Probabilistic video-based gesture recognition using self-organizing feature maps

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Gesture interaction in cooperation scenarios

CRIWG'09 Proceedings of the 15th international conference on Groupware: design, implementation, and use
Feature representations for the recognition of 3D emblematic gestures

HBU'10 Proceedings of the First international conference on Human behavior understanding
Human computer interaction with hand gestures in virtual environment

PerMIn'12 Proceedings of the First Indo-Japan conference on Perception and Machine Intelligence
Arm gesture variations during presentations are correlated with conjunctions indicating contrast

Proceedings of the 2012 ACM workshop on User experience in e-learning and augmented technologies in education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.