Bayesian networks for speech and image integration

Authors:
Sven Wachsmuth;Gerhard Sagerer
Affiliations:
Bielefeld University, Faculty of Technology, 33594 Bielefeld, Germany;Bielefeld University, Faculty of Technology, 33594 Bielefeld, Germany
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 8
Cited 5

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Control of selective perception using Bayes nets and decision theory

International Journal of Computer Vision - Special issue on active vision II
Visual semantics: extracting visual information from text accompanying pictures

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Bucket elimination: a unifying framework for probabilistic inference

Learning in graphical models
Integrated Recognition and Interpretation of Speech for a Construction Task Domain

Proceedings of HCI International (the 8th International Conference on Human-Computer Interaction) on Human-Computer Interaction: Ergonomics and User Interfaces-Volume I - Volume I
Visual recognition of multiagent action

Visual recognition of multiagent action
Helping Computer Vision by Verbal and Nonverbal Communication

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Context-specific independence in Bayesian networks

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Evaluating Integrated Speech- and Image Understanding

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Combining speech and haptics for intuitive and efficient navigation through image databases

Proceedings of the 5th international conference on Multimodal interfaces
Vision systems with the human in the loop

EURASIP Journal on Applied Signal Processing
On the integration of grounding language and learning objects

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
A computational model for the alignment of hierarchical scene representations in human-robot interaction

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The realization of natural human-computer interfaces suffers from a wide range of restrictions concerning noisy data, vague meanings, and context dependence. An essential aspect of everyday communication is the ability of humans to ground verbal interpretations in visual perception. Thus, the system has to be able to solve the correspondence problem of relating verbal and visual descriptions of the same object. This contribution proposes a new and innovative solution to this problem using Bayesian networks. In order to capture vague meanings of adjectives used by the speaker, psycholinguistic experiments are evaluated. Object recognition errors are taken into account by conditional probabilities estimated on test sets. The Bayesian network is dynamically built up from verbal object description and is evaluated by an inference technique combining bucket elimination and conditioning. Results show that speech and image data is interpreted more robustly in the combined case than in the case of isolated interpretations.