IEEE Transactions on Systems, Man and Cybernetics
Learning optimal discriminant functions through a cooperative game of automata
IEEE Transactions on Systems, Man and Cybernetics
Learning automata: an introduction
Learning automata: an introduction
Convergent activation dynamics in continuous time networks
Neural Networks
A model for reasoning about persistence and causation
Computational Intelligence
Proceedings of the seventh international conference (1990) on Machine learning
A survey of algorithmic methods for partially observed Markov decision processes
Annals of Operations Research
Practical Issues in Temporal Difference Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
Discrete-time controlled Markov processes with average cost criterion: a survey
SIAM Journal on Control and Optimization
Efficient learning and planning within the Dyna framework
Adaptive Behavior
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Machine Learning
Reinforcement learning algorithms for average-payoff Markovian decision processes
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales
Systems & Control Letters
Asynchronous Stochastic Approximations
SIAM Journal on Control and Optimization
Model-based average reward reinforcement learning
Artificial Intelligence
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Reinforcement learning with hierarchies of machines
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
Bounded-parameter Markov decision process
Artificial Intelligence
Hierarchical multi-agent reinforcement learning
Proceedings of the fifth international conference on Autonomous agents
Multiagent learning using a variable learning rate
Artificial Intelligence
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Learning Automata and Stochastic Optimization
Learning Automata and Stochastic Optimization
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning Algorithms for Markov Decision Processes with Average Cost
SIAM Journal on Control and Optimization
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
SIAM Journal on Control and Optimization
Kernel-Based Reinforcement Learning
Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Introduction: The Challenge of Reinforcement Learning
Machine Learning
Machine Learning
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Reinforcement Learning in POMDPs with Function Approximation
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Practical Reinforcement Learning in Continuous Spaces
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Q-Learning for Risk-Sensitive Control
Mathematics of Operations Research
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
ε-mdps: learning in varying environments
The Journal of Machine Learning Research
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
A Tabu-Search Hyperheuristic for Timetabling and Rostering
Journal of Heuristics
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
The Journal of Machine Learning Research
INFORMS Journal on Computing
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Risk-sensitive reinforcement learning applied to control under constraints
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Reinforcement learning as a means of dynamic aggregate QoS provisioning
Art-QoS'03 Proceedings of the 2003 international conference on Architectures for quality of service in the internet
Actor-critic algorithms for hierarchical Markov decision processes
Automatica (Journal of IFAC)
IEEE Transactions on Wireless Communications
Approximate stochastic annealing for online control of infinite horizon Markov decision processes
Automatica (Journal of IFAC)
Policy sharing between multiple mobile robots using decision trees
Information Sciences: an International Journal
Generation of tests for programming challenge tasks using multi-objective optimization
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
In the last few years, reinforcement learning (RL), also called adaptive (or approximate) dynamic programming, has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently it has attracted the attention of optimization theorists because of several noteworthy success stories from operations management. It is on large-scale and complex problems of dynamic optimization, in particular the Markov decision problem (MDP) and its variants, that the power of RL becomes more obvious. It has been known for many years that on large-scale MDPs, the curse of dimensionality and the curse of modeling render classical dynamic programming (DP) ineffective. The excitement in RL stems from its direct attack on these curses, which allows it to solve problems that were considered intractable via classical DP in the past. The success of RL is due to its strong mathematical roots in the principles of DP, Monte Carlo simulation, function approximation, and AI. Topics treated in some detail in this survey are temporal differences, Q-learning, semi-MDPs, and stochastic games. Several recent advances in RL, e.g., policy gradients and hierarchical RL, are covered along with references. Pointers to numerous examples of applications are provided. This overview is aimed at uncovering the mathematical roots of this science so that readers gain a clear understanding of the core concepts and are able to use them in their own research. The survey points to more than 100 references from the literature.