Unsupervised modeling of partially observable environments

  • Authors:
  • Vincent Graziano;Jan Koutník;Jürgen Schmidhuber

  • Affiliations:
  • IDSIA, SUPSI, University of Lugano, Manno, Switzerland;IDSIA, SUPSI, University of Lugano, Manno, Switzerland;IDSIA, SUPSI, University of Lugano, Manno, Switzerland

  • Venue:
  • ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20 × 20) under conditions of high-noise and stochastic actions.