CONDENSATION—Conditional Density Propagation forVisual Tracking
International Journal of Computer Vision
The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty
ACM Computing Surveys (CSUR)
“Put-that-there”: Voice and gesture at the graphics interface
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Where to look: a study of human-robot engagement
Proceedings of the 9th international conference on Intelligent user interfaces
Iterative Figure-Ground Discrimination
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Modeling waveform shapes with random effects segmental hidden Markov models
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online Selection of Discriminative Tracking Features
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Multiple Object Tracking via a Hierarchical Particle Filter
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Working with robots and objects: revisiting deictic reference for achieving spatial common ground
Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
Spatial language for human-robot dialogs
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Crossmodal content binding in information-processing architectures
Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Integrating vision and audition within a cognitive architecture to track conversations
Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
How to approach humans?: strategies for social robots to initiate interaction
Proceedings of the 4th ACM/IEEE international conference on Human robot interaction
Spatial scaffolding for sociable robot learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Incorporating mental simulation for a more effective robotic teammate
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Real-time face and object tracking
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Interpreting pointing gestures and spoken requests: a probabilistic, salience-based approach
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multimodal human-robot-dialog applying emotional feedbacks
ICSR'10 Proceedings of the Second international conference on Social robotics
Learning to interpret pointing gestures with a time-of-flight camera
Proceedings of the 6th international conference on Human-robot interaction
Referent identification process in human-robot multimodal communication
HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Towards mediating shared perceptual basis in situated dialogue
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Collaborative effort towards common ground in situated human-robot dialogue
Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction
Hi-index | 0.00 |
Creating a human-robot interface is a daunting experience. Capabilities and functionalities of the interface are dependent on the robustness of many different sensor and input modalities. For example, object recognition poses problems for state-of-the-art vision systems. Speech recognition in noisy environments remains problematic for acoustic systems. Natural language understanding and dialog are often limited to specific domains and baffled by ambiguous or novel utterances. Plans based on domain-specific tasks limit the applicability of dialog managers. The types of sensors used limit spatial knowledge and understanding, and constrain cognitive issues, such as perspective-taking.In this research, we are integrating several modalities, such as vision, audition, and natural language understanding to leverage the existing strengths of each modality and overcome individual weaknesses. We are using visual, acoustic, and linguistic inputs in various combinations to solve such problems as the disambiguation of referents (objects in the environment), localization of human speakers, and determination of the source of utterances and appropriateness of responses when humans and robots interact. For this research, we limit our consideration to the interaction of two humans and one robot in a retrieval scenario. This paper will describe the system and integration of the various modules prior to future testing.