A New Architecture for Learning Classifier Systems to Solve POMDP Problems

  • Authors:
  • Ali Hamzeh;Adel Rahmani

  • Affiliations:
  • (Correspd.) Computer Engineering Department Iran University of Science and Technology, Teheran, Iran. E-mail: hamzeh@iust.ac.ir/ rahmani@iust.ac.ir;Computer Engineering Department Iran University of Science and Technology, Teheran, Iran. E-mail: hamzeh@iust.ac.ir/ rahmani@iust.ac.ir

  • Venue:
  • Fundamenta Informaticae
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement Learning is a learning paradigm that helps the agent to learn to act optimally in an unknown environment through trial and error. An RL-based agent senses its environmental state, proposes an action, and applies it to the environment. Then a reinforcement signal, called the reward, is sent back from the environment to the agent. The agent is expected to learn how to maximize overall environmental reward through its internal mechanisms. One of the most challenging issues in the RL area arises as a result of the sensory ability of the agent, when it is not able to sense its current environmental state completely. These environments are called partially observable environments. In these environments, the agent may fail to distinguish the actual environmental state and so may fail to propose the optimal action in particular environmental states. So an extended mechanism must be added to the architecture of the agent to enable it to perform optimally in these environments. On the other hand, one of the most-used approaches to reinforcement learning is the evolutionary learning approach and one of the most-used techniques in this family is learning classifier systems. Learning classifier systems try to evolve state-action-reward mappings to model their current environment through trial and error. In this paper we propose a new architecture for learning classifier systems that is able to perform optimally in partially observable environments. This new architecture uses a novel method to detect aliased states in the environment and disambiguates them through multiple instances of classifier systems that interact with the environment in parallel. This model is applied to some well-known benchmark problems and is compared with some of the best classifier systems proposed for these environments. Our results and detailed discussion show that our approach is one of the best techniques among other learning classifier systems in partially observable environments.