On characteristics of markov decision processes and reinforcement learning in large domains

  • Authors:
  • Bohdana Ratitch

  • Affiliations:
  • McGill University (Canada)

  • Venue:
  • On characteristics of markov decision processes and reinforcement learning in large domains
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning is a general computational framework for learning sequential decision strategies from the interaction of an agent with a dynamic environment. In this thesis, we focus on value-based learning methods, which rely on computing utility values for different behavior strategies. Value-based reinforcement learning methods have a solid theoretical foundation and a growing history of successful applications to real-world problems. However, most existing theoretically-sound algorithms work for small problems only. For complex real-world decision tasks, approximate methods have to be used; in this case there is a significant gap between the existing theoretical results and the methodologies applied in practice. This thesis is devoted to the analysis of various factors that contribute to the difficulty of learning with popular reinforcement learning algorithms, as well as to developing new methods that facilitate the practical application of reinforcement learning techniques. In the first part of this thesis, we investigate properties of reinforcement learning tasks that influence the performance of value-based algorithms. We present five domain-independent quantitative attributes that can be used to measure various task characteristics. We study the effect of these characteristics on learning and how they can be used for improving the efficiency of existing algorithms. In particular, we develop one application that uses measurements of the proposed attributes for improving exploration (the process by which the agent gathers experience for learning good behavior strategies). In large realistic domains, function approximation methods have to be incorporated into reinforcement learning algorithms. The second part of this thesis is devoted to the use of a function approximation model based on Sparse Distributed Memories (SDMs) in approximate value-based methods. Like for all other function approximators, the success of using SDMs in reinforcement learning depends, to a large extent, on a good choice of the structure of the approximator. We propose a new technique for automatically selecting certain structural parameters of the SDM model on-line based on training data. Our algorithm takes into account the interaction of function approximation with reinforcement learning algorithms and avoids some of the difficulties faced by other methods from the existing literature. In our experiments, this method provides very good performance and is computationally efficient.