Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Authors:
Hartwig Holzapfel;Kai Nickel;Rainer Stiefelhagen
Affiliations:
Universität Karlsruhe (TH), Germany;Universität Karlsruhe (TH), Germany;Universität Karlsruhe (TH), Germany
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 16
Cited 21

The logic of typed feature structures

The logic of typed feature structures
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Communicative Rhythm in Gesture and Speech

GW '99 Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
The Karlsruhe-Verbmobil Speech Recognition Engine

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Multimodal Interaction During Multiparty Dialogues: Initial Results

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Map-Based System Using Speech and 3D Gestures for Pervasive Computing

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
SmartKom: adaptive and flexible multimodal access to multiple applications

Proceedings of the 5th international conference on Multimodal interfaces
Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Proceedings of the 5th international conference on Multimodal interfaces
Where is "it"? Event Synchronization in Gaze-Speech Input Systems

Proceedings of the 5th international conference on Multimodal interfaces
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Rapid prototyping for spoken dialogue systems

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

Inferring body pose using speech content

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Put a grammar here: bi-directional parsing in multimodal interaction

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Fusion of children's speech and 2D gestures when conversing with 3D characters

Signal Processing - Special section: Multimodal human-computer interfaces
Visual recognition of pointing gestures for human-robot interaction

Image and Vision Computing
The hinge between input and output: understanding the multimodal input fusion results in an agent-based multimodal presentation system

CHI '08 Extended Abstracts on Human Factors in Computing Systems
"Move the couch where?": developing an augmented reality multimodal interface

ISMAR '06 Proceedings of the 5th IEEE and ACM International Symposium on Mixed and Augmented Reality
Clavius: bi-directional parsing for generic multimodal interaction

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Towards a Multidimensional Approach for the Evaluation of Multimodal Application User Interfaces

Proceedings of the 13th International Conference on Human-Computer Interaction. Part II: Novel Interaction Methods and Techniques
Fusion engines for multimodal input: a survey

Proceedings of the 2009 international conference on Multimodal interfaces
Benchmarking fusion engines of multimodal interactive systems

Proceedings of the 2009 international conference on Multimodal interfaces
Concept-based evidential reasoning for multimodal fusion in human-computer interaction

Applied Soft Computing
Multimodal interaction with an autonomous forklift

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
A robot learns to know people: first contacts of a robot

KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
An input-parsing algorithm supporting integration of deictic gesture in natural language interface

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
An evaluation of an augmented reality multimodal interface using speech and paddle gestures

ICAT'06 Proceedings of the 16th international conference on Advances in Artificial Reality and Tele-Existence
Using intelligent natural user interfaces to support sales conversations

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Speak up your mind: using speech to capture innovative ideas on interactive surfaces

Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
Avaliação da usabilidade de MUI: um estudo de caso

Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
Modeling ontology for multimodal interaction in ubiquitous computing systems

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Using the transferable belief model for multimodal input fusion in companion systems

MPRSS'12 Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction
Free-hand pointing for identification and interaction with distant objects

Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.