Accelerating reinforcement learning through implicit imitation

Authors:
Bob Price;Craig Boutilier
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, B.C., Canada;Department of Computer Science, University of Toronto, Toronto, ON, Canada
Venue:
Journal of Artificial Intelligence Research
Year:
2003

Citing 38
Cited 29

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Online minimization of transition systems (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Learning to fly

ML92 Proceedings of the ninth international workshop on Machine learning
Learning in embedded systems

Learning in embedded systems
Mondrian: a teachable graphical editor

Watch what I do
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Robot programming by demonstration (RPD): supporting the induction by human interaction

Machine Learning - Special issue on robot learning
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
Behavior-based primitives for articulated control

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
DRAMA, a connectionist architecture for control and learning in autonomous robots

Adaptive Behavior
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Robot Learning From Demonstration

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Why Experimentation can be better than "Perfect Guidance"

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Hierarchical Performance Knowledge by Observation

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning to Communicate Through Imitation in Autonomous Robots

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Sequential Optimality and Coordination in Multiagent Systems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Dynamic Programming

Dynamic Programming
Practical reinforcement learning in continuous domains

Practical reinforcement learning in continuous domains
Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)

Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)
Skill reconstruction as induction of LQ controllers with subgoals

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
A Bayesian approach to imitation in reinforcement learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
On the use of hybrid reinforcement learning for autonomic resource allocation

Cluster Computing
Towards reinforcement learning representation transfer

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Effective tag mechanisms for evolving coordination

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Imitation Learning Using Graphical Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
Mutual Development of Behavior Acquisition and Recognition Based on Value System

SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Transfer in variable-reward hierarchical reinforcement learning

Machine Learning
Experiments with Adaptive Transfer Rate in Reinforcement Learning

Knowledge Acquisition: Approaches, Algorithms and Applications
Relational Learning by Imitation

KES-AMSTA '09 Proceedings of the Third KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Value functions for RL-based behavior transfer: a comparative study

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Interactive policy learning through confidence-based autonomy

Journal of Artificial Intelligence Research
Behavior bounding: an efficient method for high-level behavior comparison

Journal of Artificial Intelligence Research
Human instruction recognition and self behavior acquisition based on state value

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Emulation and behavior understanding through shared values

Robotics and Autonomous Systems
Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
A Human-Robot Collaborative Reinforcement Learning Algorithm

Journal of Intelligent and Robotic Systems
Evaluating Q-learning policies for multi-objective foraging task in a multi-agent environment

ICIRA'10 Proceedings of the Third international conference on Intelligent robotics and applications - Volume Part II
Integrating reinforcement learning with human demonstrations of varying ability

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Improvement of systems management policies using hybrid reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Exploration strategies for learning in multi-agent foraging

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
Reinforcement learning from simultaneous human and MDP reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Towards student/teacher learning in sequential decision tasks

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
R(λ) imitation learning for automatic generation control of interconnected power grids

Automatica (Journal of IFAC)
Adaptive probabilistic policy reuse

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Teaching on a budget: agents advising agents in reinforcement learning

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning collaborative team behavior from observation

Expert Systems with Applications: An International Journal
Embodied imitation-enhanced reinforcement learning in multi-agent systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with difierent action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.