The cascade-correlation learning architecture
Advances in neural information processing systems 2
Natural gradient works efficiently in learning
Neural Computation
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
SIAM Journal on Control and Optimization
An introduction to variable and feature selection
The Journal of Machine Learning Research
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Constructing basis functions from directed graphs for value function approximation
Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation
Proceedings of the 24th international conference on Machine learning
The Journal of Machine Learning Research
Neurocomputing
Adaptive bases for reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Hi-index | 0.00 |
In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated and its parameters are updated by steepest ascent in the direction of the gradient of the expected return with respect to the policy parameters. In general, the policy is defined in terms of a set of basis functions that capture important features of the problem. Since the quality of the resulting policies directly depend on the set of basis functions, and defining them gets harder as the complexity of the problem increases, it is important to be able to find them automatically. In this paper, we propose a new approach which uses cascade-correlation learning architecture for automatically constructing a set of basis functions within the context of Natural Actor-Critic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically.