The complexity of Markov decision processes
Mathematics of Operations Research
Proceedings of the seventh international conference (1990) on Machine learning
Active perception and reinforcement learning
Neural Computation
Automatic programming of behavior-based robots using reinforcement learning
Artificial Intelligence
Practical Issues in Temporal Difference Learning
Machine Learning
Reinforcement learning of non-Markov decision processes
Artificial Intelligence - Special volume on computational research on interaction and agency, part 2
Temporal difference learning and TD-Gammon
Communications of the ACM
A counterexample to temporal differences learning
Neural Computation
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Incremental multi-step Q-learning
Machine Learning - Special issue on reinforcement learning
Model-based average reward reinforcement learning
Artificial Intelligence
Module-Based Reinforcement Learning: Experiments with a Real Robot
Machine Learning - Special issue on learning in autonomous robots
Connectionist Learning in Behaviour-Based Mobile Robots: A Survey
Artificial Intelligence Review
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Embedding a Priori Knowledge in Reinforcement Learning
Journal of Intelligent and Robotic Systems
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
First Results with Utile Distinction Memory for Reinforcement Learning
First Results with Utile Distinction Memory for Reinforcement Learning
Reinforcement learning with selective perception and hidden state
Reinforcement learning with selective perception and hidden state
Large-scale dynamic optimization using teams of reinforcement learning agents
Large-scale dynamic optimization using teams of reinforcement learning agents
Hierarchical control and learning for markov decision processes
Hierarchical control and learning for markov decision processes
Reinforcement learning in distributed domains: beyond team games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Rapid, safe, and incremental learning of navigation strategies
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hidden state and reinforcement learning with instance-based stateidentification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Reinforcement Learning Framework for Parameter Control in Computer Vision Applications
CRV '04 Proceedings of the 1st Canadian Conference on Computer and Robot Vision
Using Data Mining Algorithms for Statistical Learning of a Software Agent
KES-AMSTA '07 Proceedings of the 1st KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
A Human-Robot Collaborative Reinforcement Learning Algorithm
Journal of Intelligent and Robotic Systems
Intelligent service-integrated platform based on the RFID technology and software agent system
Expert Systems with Applications: An International Journal
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Knowledge of opposite actions for reinforcement learning
Applied Soft Computing
Hi-index | 0.00 |
Reinforcement Learning (RL) is learning through directexperimentation. It does not assume the existence of a teacher thatprovides examples upon which learning of a task takes place. Instead, inRL experience is the only teacher. With historical roots on the study ofbiological conditioned reflexes, RL attracts the interest of Engineersand Computer Scientists because of its theoretical relevance andpotential applications in fields as diverse as Operational Research andIntelligent Robotics.Computationally, RL is intended to operate in a learning environmentcomposed by two subjects: the learner and a dynamic process. Atsuccessive time steps, the learner makes an observation of the processstate, selects an action and applies it back to the process. Its goal isto find out an action policy that controls the behavior of the dynamicprocess, guided by signals (reinforcements) that indicate how badly orwell it has been performing the required task. These signals are usuallyassociated to a dramatic condition – e.g., accomplishment of a subtask(reward) or complete failure (punishment), and the learner tries tooptimize its behavior by using a performance measure (a function of thereceived reinforcements). The crucial point is that in order to do that,the learner must evaluate the conditions (associations between observedstates and chosen actions) that led to rewards or punishments.Starting from basic concepts, this tutorial presents the many flavorsof RL algorithms, develops the corresponding mathematical tools, assesstheir practical limitations and discusses alternatives that have beenproposed for applying RL to realistic tasks.