ε-mdps: learning in varying environments

Authors:
István Szita;Bálint Takács;András Lörincz
Affiliations:
Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 17
Cited 4

Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Gross motion planning—a survey

ACM Computing Surveys (CSUR)
Technical Note: \cal Q-Learning

Machine Learning
Scaling reinforcement learning algorithms by learning variable temporal resolution models

ML92 Proceedings of the ninth international workshop on Machine learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
When the best move isn't optimal: Q-learning with exploration

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Neurocontroller using dynamic state feedback for compensatory control

Transactions of the Society for Computer Simulation International - Special issue: simulation methodology in transportation systems
Module-Based Reinforcement Learning: Experiments with a Real Robot

Machine Learning - Special issue on learning in autonomous robots
Multi-time models for temporally abstract planning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Bounded-parameter Markov decision process

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Dynamic Programming

Dynamic Programming
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Reinforcement Learning in Continuous Time and Space

Neural Computation
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research

Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
Factored temporal difference learning in the new ties environment

Acta Cybernetica
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
AGI architecture measures human parameters and optimizes human performance

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.