From a wizard of Oz experiment to a real time speech and gesture multimodal interface

Authors:
S. Carbini;L. Delphin-Poulat;L. Perron;J. E. Viallet
Affiliations:
France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France
Venue:
Signal Processing - Special section: Multimodal human-computer interfaces
Year:
2006

Citing 14
Cited 5

A Fast and Accurate Face Detector Based on Neural Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detection and Estimation of Pointing Gestures in Dense Disparity Maps

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
VizWear-Active: Distributed Monte Carlo Face Tracking for Wearable Active Cameras

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
3-D Articulated Pose Tracking for Untethered Diectic Reference

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Interactive skills using active gaze tracking

Proceedings of the 5th international conference on Multimodal interfaces
Robust Real-Time Face Detection

International Journal of Computer Vision
Arm-Pointing Gesture Interface Using Surrounded Stereo Cameras System

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inferring body pose using speech content

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A study of manual gesture-based selection for the PEMMI multimodal transport management interface

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
3D-tracking of head and hands for pointing gesture recognition in a human-robot interaction scenario

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Tracking body parts of multiple people for multi-person multimodal interface

ICCV'05 Proceedings of the 2005 international conference on Computer Vision in Human-Computer Interaction

Testing the performance of spoken dialogue systems by means of an artificially simulated user

Artificial Intelligence Review
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Evaluation of contactless multimodal pointing devices

IASTED-HCI '07 Proceedings of the Second IASTED International Conference on Human Computer Interaction
What gestures to perform a collaborative storytelling?

ICVS'07 Proceedings of the 4th international conference on Virtual storytelling: using virtual reality technologies for storytelling
VIRSTORY: a collaborative virtual storytelling

ICEC'06 Proceedings of the 5th international conference on Entertainment Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a Wizard of Oz cooperative story telling experiment named Virstory, where user speech-gesture actions are interpreted in order to cooperatively build a story with another person, partner of the interpreter. The gesture, speech and multimodal behaviours of 20 subjects are detailed. The multimodal oral with gesture large display interface (MOWGLI) is then described. It is an oral and gesture multimodal human-computer interface, allowing users interacting remotely in real time. Continuous pointing direction and other hand discrete selection gestures are recognized by computer vision tracking of user's head and hands. Associating gesture recognition with speech recognition of selection and deselection oral commands, MOWGLI behaves as a virtual contactless, application independent, multimodal mouse. Discrete pointing locations corresponding to discrete speech or gesture selection time events are extracted from the continuous pointing process. A large vocabulary related to a chess game application allows shorter and specific multimodal commands such as pointing at desired location 〈there〉 and uttering a piece move oral command without needing a previous pointing gesture to another piece location, whereas generic "Put that there" commands need two successive pointing locations (〈that〉 and 〈there〉). Contextual constraints such as displacement rules of pieces and current game position allow interpretation of ambiguous commands and lead to shorter multimodal commands.