Using vision, acoustics, and natural language for disambiguation

Authors:
Benjamin Fransen;Vlad Morariu;Eric Martinson;Samuel Blisard;Matthew Marge;Scott Thomas;Alan Schultz;Dennis Perzanowski
Affiliations:
Naval Research Laboratory, Washington, DC;Naval Research Laboratory, Washington, DC and University of Maryland, College Park, MD;Naval Research Laboratory, Washington, DC and Georgia Institute of Technology, Atlanta, GA;Naval Research Laboratory, Washington, DC and University of Missouri-Columbia, Columbia, MO;Naval Research Laboratory, Washington, DC and University of Edinburgh, Edinburgh, Scotland;Naval Research Laboratory, Washington, DC and University of Maryland, College Park, MD;Naval Research Laboratory, Washington, DC;Naval Research Laboratory, Washington, DC
Venue:
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Year:
2007

Citing 13
Cited 12

CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty

ACM Computing Surveys (CSUR)
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Where to look: a study of human-robot engagement

Proceedings of the 9th international conference on Intelligent user interfaces
Iterative Figure-Ground Discrimination

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Modeling waveform shapes with random effects segmental hidden Markov models

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online Selection of Discriminative Tracking Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Multiple Object Tracking via a Hierarchical Particle Filter

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Working with robots and objects: revisiting deictic reference for achieving spatial common ground

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Spatial language for human-robot dialogs

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Enhanced sound localization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Crossmodal content binding in information-processing architectures

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Integrating vision and audition within a cognitive architecture to track conversations

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
How to approach humans?: strategies for social robots to initiate interaction

Proceedings of the 4th ACM/IEEE international conference on Human robot interaction
Spatial scaffolding for sociable robot learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Incorporating mental simulation for a more effective robotic teammate

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Real-time face and object tracking

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Interpreting pointing gestures and spoken requests: a probabilistic, salience-based approach

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multimodal human-robot-dialog applying emotional feedbacks

ICSR'10 Proceedings of the Second international conference on Social robotics
Learning to interpret pointing gestures with a time-of-flight camera

Proceedings of the 6th international conference on Human-robot interaction
Referent identification process in human-robot multimodal communication

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Towards mediating shared perceptual basis in situated dialogue

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Collaborative effort towards common ground in situated human-robot dialogue

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Creating a human-robot interface is a daunting experience. Capabilities and functionalities of the interface are dependent on the robustness of many different sensor and input modalities. For example, object recognition poses problems for state-of-the-art vision systems. Speech recognition in noisy environments remains problematic for acoustic systems. Natural language understanding and dialog are often limited to specific domains and baffled by ambiguous or novel utterances. Plans based on domain-specific tasks limit the applicability of dialog managers. The types of sensors used limit spatial knowledge and understanding, and constrain cognitive issues, such as perspective-taking.In this research, we are integrating several modalities, such as vision, audition, and natural language understanding to leverage the existing strengths of each modality and overcome individual weaknesses. We are using visual, acoustic, and linguistic inputs in various combinations to solve such problems as the disambiguation of referents (objects in the environment), localization of human speakers, and determination of the source of utterances and appropriateness of responses when humans and robots interact. For this research, we limit our consideration to the interaction of two humans and one robot in a retrieval scenario. This paper will describe the system and integration of the various modules prior to future testing.