Automatic programming of behavior-based robots using reinforcement learning
Artificial Intelligence
Gross motion planning—a survey
ACM Computing Surveys (CSUR)
Technical Note: \cal Q-Learning
Machine Learning
Scaling reinforcement learning algorithms by learning variable temporal resolution models
ML92 Proceedings of the ninth international workshop on Machine learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
When the best move isn't optimal: Q-learning with exploration
AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Neurocontroller using dynamic state feedback for compensatory control
Transactions of the Society for Computer Simulation International - Special issue: simulation methodology in transportation systems
Module-Based Reinforcement Learning: Experiments with a Real Robot
Machine Learning - Special issue on learning in autonomous robots
Multi-time models for temporally abstract planning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Bounded-parameter Markov decision process
Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Dynamic Programming
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Reinforcement Learning in Continuous Time and Space
Neural Computation
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Value Function Based Reinforcement Learning in Changing Markovian Environments
The Journal of Machine Learning Research
Factored temporal difference learning in the new ties environment
Acta Cybernetica
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
AGI architecture measures human parameters and optimizes human performance
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Hi-index | 0.00 |
In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.