One-shot visual appearance learning for mobile manipulation

Authors:
Matthew R Walter;Yuli Friedman;Matthew Antone;Seth Teller
Affiliations:
MIT CS & AI Lab (CSAIL), Cambridge, MA, USA;BAE Systems, Burlington, MA, USA;BAE Systems, Burlington, MA, USA;MIT CS & AI Lab (CSAIL), Cambridge, MA, USA
Venue:
International Journal of Robotics Research
Year:
2012

Citing 12
Cited 0

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Kernel-Based Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Online Selection of Discriminative Tracking Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Object tracking: A survey

ACM Computing Surveys (CSUR)
Toward Category-Level Object Recognition (Lecture Notes in Computer Science)

Toward Category-Level Object Recognition (Lecture Notes in Computer Science)
Semi-supervised On-Line Boosting for Robust Tracking

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Object recognition and full pose registration from a single image for robotic manipulation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Multimodal interaction with an autonomous forklift

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to 'reacquire' objects over extended spatial and temporal excursions within challenging environments based upon a single training example. The primary difficulty lies in achieving an effective reacquisition capability that is robust to the effects of local clutter, lighting variation, and object relocation. We overcome these challenges through an adaptive detection algorithm that automatically generates multiple-view appearance models for each object online. As the robot navigates within the environment and the object is detected from different viewpoints, the one-shot learner opportunistically and automatically incorporates additional observations into each model. In order to overcome the effects of 'drift' common to adaptive learners, the algorithm imposes simple requirements on the geometric consistency of candidate observations. Motivating our reacquisition strategy is our work developing a mobile manipulator that interprets and autonomously performs commands conveyed by a human user. The ability to detect specific objects and reconstitute the user's segmentation hints enables the robot to be situationally aware. This situational awareness enables rich command and control mechanisms and affords natural interaction. We demonstrate one such capability that allows the human to give the robot a 'guided tour' of named objects within an outdoor environment and, hours later, to direct the robot to manipulate those objects by name using spoken instructions. We implemented our appearance-based detection strategy on our robotic manipulator as it operated over multiple days in different outdoor environments. We evaluate the algorithm's performance under challenging conditions that include scene clutter, lighting and viewpoint variation, object ambiguity, and object relocation. The results demonstrate a reacquisition capability that is effective in real-world settings.