Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

Authors:
Sanshzar Kettebekov;Mohammed Yeasin;Rajeev Sharma
Affiliations:
Pennsylvania State University;Pennsylvania State University;Pennsylvania State University
Venue:
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Year:
2002

Citing 10
Cited 9

A Computational Approach to Edge Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Taming recognition errors with a multimodal interface

Communications of the ACM
Movement Phase in Signs and Co-Speech Gestures, and Their Transcriptions by Human Coders

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
Toward Natural Gesture/Speech Control of a Large Display

EHCI '01 Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction
Reliable Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques

Visual and linguistic information in gesture classification

Proceedings of the 6th international conference on Multimodal interfaces
Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Proceedings of the 6th international conference on Multimodal interfaces
'User as assessor' approach to embodied conversational agents

From brows to trust
Visual and linguistic information in gesture classification

ACM SIGGRAPH 2006 Courses
Visual and linguistic information in gesture classification

ACM SIGGRAPH 2007 courses
An integrative recognition method for speech and gestures

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Improving continuous gesture recognition with spoken prosody

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Gesture recognition using a marionette model and dynamic bayesian networks (DBNs)

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

Although recognition of natural speech and gestures have been studied extensively, previous attempts of combining them in a unified framework to boost classification were mostly semantically motivated, e.g., keyword-gesture co-occurrence. Such formulations inherit the complexity of natural language processing. This paper presents a Bayesian formulation that uses a phenomenon of gesture and speech articulation for improving accuracy of automatic recognition of continuous coverbal gestures. The prosodic features from the speech signal were co-analyzed with the visual signal to learn the prior probability of co-occurrence of the prominent spoken segments with the particular kinematical phases of gestures. It was found that the above co-analysis helps in detecting and disambiguating small hand movements, which subsequently improves the rate of continuous gesture recognition. The efficacy of the proposed approach was demonstrated on a large database collected from the weather channel broadcast. This formulation opens new avenues for bottom-up frameworks of multimodal integration.