Inferring body pose using speech content

Authors:
Sy Bor Wang;David Demirdjian
Affiliations:
MIT CSAIL, Cambridge, MA;MIT CSAIL, Cambridge, MA
Venue:
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Year:
2005

Citing 13
Cited 2

Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Ten myths of multimodal interaction

Communications of the ACM
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Building a Task Language for Segmentation and Recognition of User Input to Cooperative Manipulation Systems

HAPTICS '02 Proceedings of the 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
3-D Articulated Pose Tracking for Untethered Diectic Reference

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Map-Based System Using Speech and 3D Gestures for Pervasive Computing

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
3D Articulated Models and Multi-View Tracking with Silhouettes

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Proceedings of the 6th international conference on Multimodal interfaces

From a wizard of Oz experiment to a real time speech and gesture multimodal interface

Signal Processing - Special section: Multimodal human-computer interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision-based body pose estimation and gesture recognition. In interfaces where a user is interacting with a computer using speech and arm gestures, the user's spoken keywords can be recognized in conjuction with a hypothesis of body poses. This co-occurence can reduce the number of body pose hypothesis for the vision based tracker. In this paper we show that incorporating speech-based body pose constraints can increase the robustness and accuracy of vision-based tracking systems.Next, we describe an approach for gesture recognition. We show how Linear Discriminant Analysis (LDA), can be employed to estimate 'good features' that can be used in a standard HMM-based gesture recognition system. We show that, by applying our LDA scheme, recognition errors can be significantly reduced over a standard HMM-based technique.We applied both techniques in a Virtual Home Desktop scenario. Experiments where the users controlled a desktop system using gestures and speech were conducted and the results show that the speech recognised in conjunction with body poses has increased the accuracy of the vision-based tracking system.