Planning and acting in partially observable stochastic domains
Artificial Intelligence
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Efficient dynamic-programming updates in partially observable Markov decision processes
Efficient dynamic-programming updates in partially observable Markov decision processes
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
The aim of this work is to individualize an architecture that allows the reactive navigation through an unsupervised learning based on the reinforcement learning. To reach the objective quoted, we used the Q-learning and one hirerarchical struture of the architecture developed. To use these techniques in presence of Partially Observable Markov Decision Processes (POMDP) is necessary introduce some innovations: heuristic techniques for the generalization of the experience and for the treatment of the partial observability, a technique for the speed adjournment of the Q function and the definition of reinforcement policy adequate for the unsupervised learning of a complex assignment. The results show a satisfactory learning of the assignment of navigation in a simulated environment.