Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Authors:
Andrew W. Moore;Christopher G. Atkeson
Affiliations:
MIT Artificial Intelligence Laboratory, NE43-771, 545 Technology Square, Cambridge, MA 02139. AWM@CS.CMU.EDU;MIT Artificial Intelligence Laboratory, NE43-771, 545 Technology Square, Cambridge, MA 02139. CGA@AI.MIT.EDU
Venue:
Machine Learning
Year:
1993

Citing 0
Cited 117

Graph learning with a nearest neighbor approach

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
A Teaching Strategy for Memory-Based Control

Artificial Intelligence Review - Special issue on lazy learning
Explanation-Based Learning and Reinforcement Learning: A Unified View

Machine Learning
Tree based discretization for continuous state space reinforcement learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
A multi-agent reinforcement learning method for a partially-observable competitive game

Proceedings of the fifth international conference on Autonomous agents
Reinforcement learning for landmark-based robot navigation

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
A Model of Partially Observable State Game and its Optimality

Applied Intelligence
Reinforcement Learning Soccer Teams with Incomplete World Models

Autonomous Robots
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies

Autonomous Agents and Multi-Agent Systems
Propagation of Q-values in Tabular TD(lambda)

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Characterizing Markov Decision Processes

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Learning a Navigation Task in Changing Environments by Multi-task Reinforcement Learning

EWLR-8 Proceedings of the 8th European Workshop on Learning Robots: Advances in Robot Learning
An Overview of MAXQ Hierarchical Reinforcement Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
TTree: Tree-Based State Generalization with Temporally Abstract Actions

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Biasing Exploration in an Anticipatory Learning Classifier System

IWLCS '01 Revised Papers from the 4th International Workshop on Advances in Learning Classifier Systems
Open Theoretical Questions in Reinforcement Learning

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Preliminary Results

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Sequential Decision Making Based on Direct Search

Sequence Learning - Paradigms, Algorithms, and Applications
Imitation and Reinforcement Learning in Agents with Heterogeneous Actions

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A Multi-agent Architecture Integrating Learning and Fuzzy Techniques for Landmark-Based Robot Navigation

CCIA '02 Proceedings of the 5th Catalonian Conference on AI: Topics in Artificial Intelligence
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Analysis of Optimal Criteria

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Speeding up the calculation of heuristics for heuristic search-based planning

Eighteenth national conference on Artificial intelligence
Anticipations control behavior: animal behavior in an anticipatory learning classifier system

Adaptive Behavior
Controlling the learning process of real-time heuristic search

Artificial Intelligence
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
P3VI: a partitioned, prioritized, parallel value iterator

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Incremental heuristic search in AI

AI Magazine
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Reliability of internal prediction/estimation and its application: I. adaptive action selection reflecting reliability of value function

Neural Networks
Learning and Exploiting Relative Weaknesses of Opponent Agents

Autonomous Agents and Multi-Agent Systems
Prioritized Multiplicative Schwarz Procedures for Solving Linear Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Machine Learning
Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
Dealing with non-stationary environments using context detection

ICML '06 Proceedings of the 23rd international conference on Machine learning
Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
An intrinsic reward mechanism for efficient exploration

ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical approach to efficient reinforcement learning in deterministic domains

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Representation and timing in theories of the dopamine system

Neural Computation
The Two Facets of the Exploration-Exploitation Dilemma

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
DEA: An Architecture for Goal Planning and Classification

Neural Computation
An Action-Selection Calculus

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Neural Computation
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Confidence-based policy learning from demonstration using Gaussian mixture models

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Accelerating autonomous learning by using heuristic selection of actions

Journal of Heuristics
Cooperation learning in Multi-Agent Systems with annotation and reward

International Journal of Knowledge-based and Intelligent Engineering Systems
Hierarchical model-based reinforcement learning: R-max + MAXQ

Proceedings of the 25th international conference on Machine learning
Expediting RL by using graphical structures

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Anticipatory Model of Musical Style Imitation Using Collaborative and Competitive Reinforcement Learning

Anticipatory Behavior in Adaptive Learning Systems
Efficient Continuous-Time Reinforcement Learning with Adaptive State Graphs

ECML '07 Proceedings of the 18th European conference on Machine Learning
Instance-Based Action Models for Fast Action Planning

RoboCup 2007: Robot Soccer World Cup XI
An Empirical Analysis of the Impact of Prioritised Sweeping on the DynaQ's Performance

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Epoch-Incremental Queue-Dyna Algorithm

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Transferring Instances for Model-Based Reinforcement Learning

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Online optimization of replacement policies using learning automata

International Journal of Systems Science
ISRL: intelligent search by reinforcement learning in unstructured peer-to-peer networks

International Journal of Parallel, Emergent and Distributed Systems
Opportunities for multiagent systems and multiagent reinforcement learning in traffic control

Autonomous Agents and Multi-Agent Systems
Modeling reinforcement learning algorithms for performance analysis

Proceedings of the International Conference on Advances in Computing, Communication and Control
Generalized model learning for reinforcement learning in factored domains

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A task specification language for bootstrap learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Neuroevolutionary reinforcement learning for generalized helicopter control

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Representing systems with hidden state

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Sample-efficient evolutionary function approximation for reinforcement learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Compositional Models for Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Adaptive modeling and planning for reactive agents

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Accelerating reinforcement learning by composing solutions of automatically identified subtasks

Journal of Artificial Intelligence Research
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
Existence of multiagent equilibria with limited agents

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Dynamic control in real-time heuristic search

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Multi-value-functions: efficient automatic action hierarchies for multiple goal MDPs

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Online learning and exploiting relational models in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A decision-theoretic model of assistance

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Real-time heuristic search with a priority queue

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
On heuristic reasoning, reactivity, and search

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A Bayesian approach to imitation in reinforcement learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Utility-based on-line exploration for repeated navigation in an embedded graph

Artificial Intelligence
An adaptive inventory control for a supply chain

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Model-based reinforcement learning: a computational model and an fMRI study

Neurocomputing
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Learning autonomous behaviours for non-holonomic vehicles

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
TTree: tree-based state generalization with temporally abstract actions

Adaptive agents and multi-agent systems
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Efficient goal-directed exploration

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Towards a real-world scenario for investigating organic computing principles in heterogeneous societies of robots

Proceedings of the 2011 workshop on Organic computing
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Emotion-based intrinsic motivation for reinforcement learning agents

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Flexible decomposition algorithms for weakly coupled Markov decision problems

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Patching approximate solutions in reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Optimal motion planning by reinforcement learning in autonomous mobile vehicles

Robotica
New prioritized value iteration for Markov decision processes

Artificial Intelligence Review
Topological value iteration algorithms

Journal of Artificial Intelligence Research
When do differences matter? On-line feature extraction through cognitive economy

Cognitive Systems Research
Fuzzy epoch-incremental reinforcement learning algorithm

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research
Analysis of methods for solving MDPs

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Budgeted knowledge transfer for state-wise heterogeneous RL agents

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Near-optimal continuous patrolling with teams of mobile information gathering agents

Artificial Intelligence
Modular value iteration through regional decomposition

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Q-learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks

Wireless Personal Communications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new algorithm, prioritized sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as temporal differencing and Q-learning have real-time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare prioritized sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real-time problems with which other methods have difficulty.