Multi-modal features for real-time detection of human-robot interaction categories

Authors:
Ian R. Fasel;Masahiro Shiomi;Pilippe-Emmanuel Chadutaud;Takayuki Kanda;Norihiro Hagita;Hiroshi Ishiguro
Affiliations:
University of Arizona, Tucson, AZ, USA;Applied Telecommunications Research International, Kyoto, Japan;Applied Telecommunications Research International, Kyoto, Japan;Applied Telecommunications Research International, Kyoto, Japan;Applied Telecommunications Research International, Kyoto, Japan;Applied Telecommunications Research International, Kyoto, Japan
Venue:
Proceedings of the 2009 international conference on Multimodal interfaces
Year:
2009

Citing 8
Cited 1

Tabu Search

Tabu Search
A generative framework for real time object detection and classification

Computer Vision and Image Understanding - Special issue on eye detection and tracking
A Multi-Agent Approach to Social Human Behaviour in Children's Play

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
On-line behaviour classification and adaptation to human-robot interaction styles

Proceedings of the ACM/IEEE international conference on Human-robot interaction
A semi-autonomous communication robot: a field trial at a train station

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Simultaneous teleoperation of multiple social robots

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Body movement analysis of human-robot interaction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Dynamics of facial expression extracted automatically from video

Image and Vision Computing

Using group history to identify character-directed utterances in multi-child interactions

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social interactions unfold over time, at multiple time scales, and can be observed through multiple sensory modalities. In this paper, we propose a machine learning framework for selecting and combining low-level sensory features from different modalities to produce high-level characterizations of human-robot social interactions in real-time. We introduce a novel set of fast, multi-modal, spatio-temporal features for audio sensors, touch sensors, floor sensors, laser range sensors, and the time-series history of the robot's own behaviors. A subset of these features are automatically selected and combined using GentleBoost, an ensemble machine learning technique, allowing the robot to make an estimate of the current interaction category every 100 milliseconds. This information can then be used either by the robot to make decisions autonomously, or by a remote human-operator who can modify the robot's behavior manually (i.e., semi-autonomous operation). We demonstrate the technique on an information-kiosk robot deployed in a busy train station, focusing on the problem of detecting interaction breakdowns (i.e., failure of the robot to engage in a good interaction). We show that despite the varied and unscripted nature of human-robot interactions in the real-world train-station setting, the robot can achieve highly accurate predictions of interaction breakdowns at the same instant human observers become aware of them.