From a wizard of Oz experiment to a real time speech and gesture multimodal interface

  • Authors:
  • S. Carbini;L. Delphin-Poulat;L. Perron;J. E. Viallet

  • Affiliations:
  • France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France;France Télécom R&D, Avenue Pierre Marzin, Lannion, France

  • Venue:
  • Signal Processing - Special section: Multimodal human-computer interfaces
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a Wizard of Oz cooperative story telling experiment named Virstory, where user speech-gesture actions are interpreted in order to cooperatively build a story with another person, partner of the interpreter. The gesture, speech and multimodal behaviours of 20 subjects are detailed. The multimodal oral with gesture large display interface (MOWGLI) is then described. It is an oral and gesture multimodal human-computer interface, allowing users interacting remotely in real time. Continuous pointing direction and other hand discrete selection gestures are recognized by computer vision tracking of user's head and hands. Associating gesture recognition with speech recognition of selection and deselection oral commands, MOWGLI behaves as a virtual contactless, application independent, multimodal mouse. Discrete pointing locations corresponding to discrete speech or gesture selection time events are extracted from the continuous pointing process. A large vocabulary related to a chess game application allows shorter and specific multimodal commands such as pointing at desired location 〈there〉 and uttering a piece move oral command without needing a previous pointing gesture to another piece location, whereas generic "Put that there" commands need two successive pointing locations (〈that〉 and 〈there〉). Contextual constraints such as displacement rules of pieces and current game position allow interpretation of ambiguous commands and lead to shorter multimodal commands.