Learning the semantics of object-action relations by observation

  • Authors:
  • Eren Erdal Aksoy;Alexey Abramov;Johannes Dörr;Kejun Ning;Babette Dellen;Florentin Wörgötter

  • Affiliations:
  • Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany;Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany;Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany;Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany;Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany, Institut de Robòtica i Informática Industrial (CSIC- ...;Bernstein Center for Computational Neuroscience, University of Göttingen, III. Physikalisches Institut, Göttingen, Germany

  • Venue:
  • International Journal of Robotics Research
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.