A multi-modal approach for natural human-robot interaction

Authors:
Thomas Kollar;Anu Vedantham;Corey Sobel;Cory Chang;Vittorio Perera;Manuela Veloso
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ICSR'12 Proceedings of the 4th international conference on Social Robotics
Year:
2012

Citing 10
Cited 0

Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Vision-Based Gesture Recognition: A Review

GW '99 Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction
Sociable machines: expressive social exchange between humans and robots

Sociable machines: expressive social exchange between humans and robots
Co-Adaptation of audio-visual speech and gesture classifiers

Proceedings of the 8th international conference on Multimodal interfaces
Footing in human-robot conversations: how robots might shape participant roles using gaze cues

Proceedings of the 4th ACM/IEEE international conference on Human robot interaction
Gesture in automatic discourse processing

Gesture in automatic discourse processing
Dialogue patterns of an arabic robot receptionist

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot

Computation for metaphors, analogy, and agents
An effective personal mobile robot agent through symbiotic human-robot interaction

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Facilitating multiparty dialog with gaze, gesture, and speech

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a robot that is able to interact with people in a natural, multi-modal way by using both speech and gesture. The robot is able to track people, process speech and understand language. To track people and recognize gestures, the robot uses an RGB-D sensor (e.g., a Microsoft Kinect). To recognize speech, the robot uses a cloud-based service. To understand language, the robot uses a probabilistic graphical model to infer the meaning of a natural language query. We have evaluated our system in two domains. The first domain is a robot receptionist (roboceptionist); we show that the roboceptionist is able to interact successfully with people 77% of the time when people are primed with the capabilities of the robot compared to 57% when people are not primed with its capabilities. The second domain is a mobile service robot, which is able to interact with people via natural language.