Reinforcement learning for the adaptive control of perception and action
Reinforcement learning for the adaptive control of perception and action
Learning in embedded systems
Memoryless policies: theoretical limitations and practical results
SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
PALO: a probabilistic hill-climbing algorithm
Artificial Intelligence
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Sample Average Approximation Method for Stochastic Discrete Optimization
SIAM Journal on Optimization
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Reinforcement learning with selective perception and hidden state
Reinforcement learning with selective perception and hidden state
Approximating optimal policies for partially observable stochastic domains
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Improving the performance of complex agent plans through reinforcement learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
LearnPNP: a tool for learning agent behaviors
RoboCup 2010
Reinforcement learning through global stochastic search in N-MDPs
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
The thing that we tried didn't work very well: deictic representation in reinforcement learning
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Induction and learning of finite-state controllers from simulation
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Policy oscillation is overshooting
Neural Networks
Hi-index | 0.00 |
We present a new, model-free reinforcement learning algorithm for learning to control partially-observable Markov decision processes. The algorithm incorporates ideas from action-value based reinforcement learning approaches, such as Q-Learning, as well as ideas from the stochastic optimization literature. Key to our approach is a new definition of action value, which makes the algorithm theoretically sound for partially-observable settings. We show that special cases of our algorithm can achieve probability one convergence to locally optimal policies in the limit, or probably approximately correct hill-climbing to a locally optimal policy in a finite number of samples.