Policy search using paired comparisons

Authors:
Malcolm J. A. Strens;Andrew W. Moore
Affiliations:
Guidance & Imaging Solutions, QinetiQ, Ively Road, Farnborough, Hampshire GU14 0LX, UK;School of Computer Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 8
Cited 3

Reinforcement Learning

Reinforcement Learning
Training Reinforcement Neurocontrollers Using the Polytope Algorithm

Neural Processing Letters
Direct Policy Search using Paired Statistical Tests

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Direct Policy Search and Uncertain Policy Evaluation

Direct Policy Search and Uncertain Policy Evaluation
Learning evaluation functions to improve optimization by local search

The Journal of Machine Learning Research

Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
On policy learning in restricted policy spaces

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Lazy paired hyper-parameter tuning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the 'overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.