Reinforcement Learning
Training Reinforcement Neurocontrollers Using the Polytope Algorithm
Neural Processing Letters
Direct Policy Search using Paired Statistical Tests
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Direct Policy Search and Uncertain Policy Evaluation
Direct Policy Search and Uncertain Policy Evaluation
Learning evaluation functions to improve optimization by local search
The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
On policy learning in restricted policy spaces
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Lazy paired hyper-parameter tuning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the 'overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.