Improving continuous gesture recognition with spoken prosody

Authors:
Sanshzar Kettebekov;Mohammed Yeasin;Rajeev Sharma
Affiliations:
Department of Computer Science and Engineering, Pennsylvania State University, Pond Laboratories, University Park, PA;Department of Computer Science and Engineering, Pennsylvania State University, Pond Laboratories, University Park, PA;Department of Computer Science and Engineering, Pennsylvania State University, Pond Laboratories, University Park, PA
Venue:
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Year:
2003

Citing 14
Cited 4

The rise/fall/connection model of intonation

Speech Communication
A State-Based Approach to the Representation and Recognition of Gesture

IEEE Transactions on Pattern Analysis and Machine Intelligence
An HMM-Based Threshold Model Approach for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hidden Markov models for modeling and recognizing gesture under variation

Hidden Markov models
Movement Phase in Signs and Co-Speech Gestures, and Their Transcriptions by Human Coders

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
Velocity Profile Based Recognition of Dynamic Gestures with Discrete Hidden Markov Models

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
Video-Based Sign Language Recognition Using Hidden Markov Models

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
High Performance Real-Time Gesture Recognition Using Hidden Markov Models

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
Toward Natural Gesture/Speech Control of a Large Display

EHCI '01 Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction
Reliable Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Bayesian approach to learning Bayesian networks with local structure

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Exploiting prosodic structuring of coverbal gesticulation

Proceedings of the 6th international conference on Multimodal interfaces
Differential video coding of face and gesture events in presentation videos

Computer Vision and Image Understanding - Special issue on event detection in video
Microcontroller and sensors based gesture vocalizer

ISPRA'08 Proceedings of the 7th WSEAS International Conference on Signal Processing, Robotics and Automation
Arm gesture variations during presentations are correlated with conjunctions indicating contrast

Proceedings of the 2012 ACM workshop on User experience in e-learning and augmented technologies in education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite recent advances in gesture recognition, reliance on the visual signal alone to classify unrestricted continuous gesticulation is inherently error-prone. Since spontaneous gesticulation is mostly coverbal in nature, there have been some attempts of using speech cues to improve gesture recognition. Some attempts have been made in using speech cues to improve gesture recognition, e.g., keyword-gesture co-analysis. Use of such scheme is burdened by the complexity of natural language understanding. This paper offers a novel "signal-level" perspective by exploring prosodic phenomena of spontaneous gesture and speech coproduction. We present a computational framework for improving continuous gesture recognition based on two phenomena that capture voluntary (co-articulation) and involuntary (physiological) contributions of prosodic synchronization. Physiological constraints, manifested as signal interruptions in multimodal production, are exploited in an audio-visual feature integration framework using Hidden Markov Models (HMMs). Coarticulation is analyzed using a Bayesian network of naïve classifiers to explore alignment of intonationally prominent speech segments and hand kinematics. The efficacy of the proposed approach was demonstrated on a multimodal corpus created from the Weather Channel broadcast. Both schemas were found to contribute uniquely by reducing different error types, which subsequently improves the performance of continuous gesture recognition.