Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Analyzing feature generation for value-function approximation
Proceedings of the 24th international conference on Machine learning
On-the-Fly exact computation of bisimilarity distances
TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Hi-index | 0.00 |
Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision making for intelligent agents acting in stochastic environments. One of the important challenges facing such agents in practical applications is finding a suitable way to represent the state space, so that a good way of behaving can be learned efficiently. In this paper, we focus on learning a good policy when function approximation must be used to represent the value function. In this case, states are mapped into feature vectors, and a set of parameters is learned, which allows us to approximate the value of any given state. Theoretically, the quality of the approximation that can be obtained depends on the set of features. In practice, the feature set affects not only the quality of the solution obtained, but also the speed of learning.