Multimodal interaction in an augmented reality scenario

Authors:
Gunther Heidemann;Ingo Bax;Holger Bekel
Affiliations:
Bielefeld University, Bielefeld, Germany;Bielefeld University, Bielefeld, Germany;Bielefeld University, Bielefeld, Germany
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 11
Cited 9

Context-free attentional operators: the generalized symmetry transform

International Journal of Computer Vision - Special issue on qualitative vision
Perceptual user interfaces: perceptual intelligence

Communications of the ACM
Perceptual user interfaces: multimodal interfaces that process what comes naturally

Communications of the ACM
Perceptual user interfaces: things that see

Communications of the ACM
Wearable Computers: No Longer Science Fiction

IEEE Pervasive Computing
Generic Object Recognition: Building and Matching Coarse Descriptions from Line Drawings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Entropie als Maß des lokalen Informationsgehalts in Bildern zur Realisierung einer Aufmerksamkeitssteuerung

Mustererkennung 1996, 18. DAGM-Symposium
Using Speech in Visual Object Recognition

Mustererkennung 2000, 22. DAGM-Symposium
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Skin Detection in Video under Changing Illumination Conditions

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 1
Integrating context-free and context-dependent attentional mechanisms for gestural object reference

ICVS'03 Proceedings of the 3rd international conference on Computer vision systems

Combining environmental cues & head gestures to interact with wearable devices

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
2005 Special Issue: Interactive image data labeling using self-organizing maps in an augmented reality scenario

Neural Networks - 2005 Special issue: IJCNN 2005
Vision systems with the human in the loop

EURASIP Journal on Applied Signal Processing
The visual active memory perspective on integrated recognition systems

Image and Vision Computing
"Move the couch where?": developing an augmented reality multimodal interface

ISMAR '06 Proceedings of the 5th IEEE and ACM International Symposium on Mixed and Augmented Reality
A multimodal labeling interface for wearable computing

Proceedings of the 15th international conference on Intelligent user interfaces
Palm-on haptic environment in augmented reality

HCI '08 Proceedings of the Third IASTED International Conference on Human Computer Interaction
An evaluation of an augmented reality multimodal interface using speech and paddle gestures

ICAT'06 Proceedings of the 16th international conference on Advances in Artificial Reality and Tele-Existence
Free-hand pointing for identification and interaction with distant objects

Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an augmented reality system designed for online acquisition of visual knowledge and retrieval of memorized objects. The system relies on a head mounted camera and display, which allow the user to view the environment together with overlaid augmentations by the system. In this setup, communication by hand gestures and speech is mandatory as common input devices like mouse and keyboard are not available. Using gesture and speech, basically three types of tasks must be handled: (i) Communication with the system about the environment, in particular, directing attention towards objects and commanding the memorization of sample views; (ii) control of system operation, e.g. switching between display modes; and (iii) re-adaptation of the interface itself in case communication becomes unreliable due to changes in external factors, such as illumination conditions. We present an architecture to manage these tasks and describe and evaluate several of its key elements, including modules for pointing gesture recognition, menu control based on gesture and speech, and control strategies to cope with situations when vision becomes unreliable and has to be re-adapted by speech.