Sparse gradient-based direct policy search

  • Authors:
  • Nataliya Sokolovska

  • Affiliations:
  • Department of Computing, Macquarie University, Sydney, Australia

  • Venue:
  • ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems. In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L1 norm. We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.