Policy search using paired comparisons

  • Authors:
  • Malcolm J. A. Strens;Andrew W. Moore

  • Affiliations:
  • Guidance & Imaging Solutions, QinetiQ, Ively Road, Farnborough, Hampshire GU14 0LX, UK;School of Computer Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the 'overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.