Learning to act using real-time dynamic programming

Authors:
Andrew G. Barto;Steven J. Bradtke;Satinder P. Singh
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA;Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA;Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA
Venue:
Artificial Intelligence
Year:
1995

Citing 37
Cited 66

A heuristic search algorithm with modifiable estimate

Artificial Intelligence
Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research

IEEE Transactions on Systems, Man and Cybernetics
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound

Search in Artificial Intelligence
Universal planning: an (almost) universally bad idea

AI Magazine
Penguins can make cake

AI Magazine
In defense of reaction plans as caches

AI Magazine
Real-time heuristic search

Artificial Intelligence
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning to control an unstable system with forward modeling

Advances in neural information processing systems 2
Sequential decision problems and neural networks

Advances in neural information processing systems 2
Planning and control

Planning and control
Self-improving reactive agents: case studies of reinforcement learning frameworks

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Navigating through temporal difference

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Note on learning rate schedules for stochastic optimization

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Practical Issues in Temporal Difference Learning

Machine Learning
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Numerical methods for stochastic control problems in continuous time

Numerical methods for stochastic control problems in continuous time
Efficient learning and planning within the Dyna framework

Adaptive Behavior
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Connectionistic Problem-Solving

Connectionistic Problem-Solving
Introduction to Stochastic Dynamic Programming: Probability and Mathematical

Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Reinforcement Learning Applied to Linear Quadratic Regulation

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Abstraction in Control Learning

Abstraction in Control Learning
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Universal plans for reactive robots in unpredictable environments

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 2
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Some studies in machine learning using the game of checkers. II: recent progress

IBM Journal of Research and Development
Two kinds of training information for evaluation function learning

AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Programming robots using reinforcement learning and teaching

AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2

Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Heuristic search in cyclic AND/OR graphs

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Value-update rules for real-time search

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Minimax TD-Learning with Neural Nets in a Markov Game

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Propagation of Q-values in Tabular TD(lambda)

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Module Based Reinforcement Learning: An Application to a Real Robot

EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
Learning a Navigation Task in Changing Environments by Multi-task Reinforcement Learning

EWLR-8 Proceedings of the 8th European Workshop on Learning Robots: Advances in Robot Learning
Modelling Intelligent Behaviour: The Markov Decision Process Approach

IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
Cognition, Sociability, and Constraints

Balancing Reactivity and Social Deliberation in Multi-Agent Systems, From RoboCup to Real-World Applications (selected papers from the ECAI 2000 Workshop and additional contributions)
Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer

RoboCup 2000: Robot Soccer World Cup IV
Karlsruhe Brainstormers - Design Principles

RoboCup-99: Robot Soccer World Cup III
Karlsruhe Brainstormers 2000 Team Description

RoboCup 2000: Robot Soccer World Cup IV
Distributed Learning and Control for Manufacturing Systems Scheduling

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
θ-Subsumption Based on Object Context

Inductive Logic Programming
R-FRTDP: A Real-Time DP Algorithm with Tight Bounds for a Stochastic Resource Allocation Problem

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Epoch-Incremental Queue-Dyna Algorithm

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Basal Ganglia Models for Autonomous Behavior Learning

Creating Brain-Like Intelligence
A dynamical systems perspective on agent-environment interaction

Artificial Intelligence
A Survey of Motion Planning Algorithms from the Perspective of Autonomous UAV Guidance

Journal of Intelligent and Robotic Systems
Finding Best k Policies

ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
Reinforcement Learning Based Web Service Compositions for Mobile Business

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Robust adaptive Markov decision processes in multi-vehicle applications

ACC'09 Proceedings of the 2009 conference on American Control Conference
Feature Article---Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming

INFORMS Journal on Computing
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Constraint-based agents: an architecture for constraint-based modeling and local-search-based reasoning for planning and scheduling in open and dynamic worlds

Constraint-based agents: an architecture for constraint-based modeling and local-search-based reasoning for planning and scheduling in open and dynamic worlds
Amsaa: a multistep anticipatory algorithm for online stochastic combinatorial optimization

CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
Deterministic POMDPs revisited

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ

IEEE Transactions on Signal Processing
On-line learning and optimization for wireless video transmission

IEEE Transactions on Signal Processing
Online learning in autonomic multi-hop wireless networks for transmitting mission-critical applications

IEEE Journal on Selected Areas in Communications
Learning-based robot vision: principles and applications

Learning-based robot vision: principles and applications
Using training regimens to teach expanding function approximators

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A robust and fast action selection mechanism for planning

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Speeding safely: multi-criteria optimization in probabilistic planning

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Easy and hard testbeds for real-time search algorithms

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Improving the learning efficiencies of realtime search

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Auto-exploratory average reward reinforcement learning

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Ranking policies in discrete Markov decision processes

Annals of Mathematics and Artificial Intelligence
Case-based subgoaling in real-time heuristic search for video game pathfinding

Journal of Artificial Intelligence Research
Anytime state-based solution methods for decision processes with non-Markovian rewards

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Dynamic programming model for determining bidding strategies in sequential auctions: quasi-linear utility and budget constraints

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Correlated action effects in decision theoretic regression

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Patching approximate solutions in reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Solving uncertain markov decision problems: an interval-based method

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part II
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Sequentially optimal repeated coalition formation under uncertainty

Autonomous Agents and Multi-Agent Systems
Improvement of air handling unit control performance using reinforcement learning

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
An adaptive learning scheme for load balancing with zone partition in multi-sink wireless sensor network

Expert Systems with Applications: An International Journal
Topological value iteration algorithms

Journal of Artificial Intelligence Research
Stochastic enforced hill-climbing

Journal of Artificial Intelligence Research
Goal recognition over POMDPs: inferring the intention of a POMDP agent

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Discovering hidden structure in factored MDPs

Artificial Intelligence
Optimized look-ahead tree search policies

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Proximity-based non-uniform abstractions for approximate planning

Journal of Artificial Intelligence Research
Avoiding and escaping depressions in real-time heuristic search

Journal of Artificial Intelligence Research
Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Cognitive Robotics and Multiagency in a Fuzzy Modeling Framework

International Journal of Agent Technologies and Systems
Hybrid POMDP based evolutionary adaptive framework for efficient visual tracking algorithms

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Light at the end of the tunnel: a Monte Carlo approach to computing value of information

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Embodied imitation-enhanced reinforcement learning in multi-agent systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf's Learning-Real-Time-A^* algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory.