Integrating simultaneous input from speech, gaze, and hand gestures
Intelligent multimedia interfaces
Integration and synchronization of input modes during multimodal human-computer interaction
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Mutual disambiguation of recognition errors in a multimodel architecture
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Ten myths of multimodal interaction
Communications of the ACM
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
HAPTICS '02 Proceedings of the 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems
“Put-that-there”: Voice and gesture at the graphics interface
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
3-D Articulated Pose Tracking for Untethered Diectic Reference
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Map-Based System Using Speech and 3D Gestures for Pervasive Computing
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
3D Articulated Models and Multi-View Tracking with Silhouettes
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality
Proceedings of the 5th international conference on Multimodal interfaces
Proceedings of the 6th international conference on Multimodal interfaces
From a wizard of Oz experiment to a real time speech and gesture multimodal interface
Signal Processing - Special section: Multimodal human-computer interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Hi-index | 0.00 |
Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision-based body pose estimation and gesture recognition. In interfaces where a user is interacting with a computer using speech and arm gestures, the user's spoken keywords can be recognized in conjuction with a hypothesis of body poses. This co-occurence can reduce the number of body pose hypothesis for the vision based tracker. In this paper we show that incorporating speech-based body pose constraints can increase the robustness and accuracy of vision-based tracking systems.Next, we describe an approach for gesture recognition. We show how Linear Discriminant Analysis (LDA), can be employed to estimate 'good features' that can be used in a standard HMM-based gesture recognition system. We show that, by applying our LDA scheme, recognition errors can be significantly reduced over a standard HMM-based technique.We applied both techniques in a Virtual Home Desktop scenario. Experiments where the users controlled a desktop system using gestures and speech were conducted and the results show that the speech recognised in conjunction with body poses has increased the accuracy of the vision-based tracking system.