Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

Authors:
Richard S. Sutton;Joseph Modayil;Michael Delp;Thomas Degris;Patrick M. Pilarski;Adam White;Doina Precup
Affiliations:
University of Alberta, Canada;University of Alberta, Canada;University of Alberta, Canada;University of Alberta, Canada;University of Alberta, Canada;University of Alberta, Canada;McGill University, Montreal, Canada
Venue:
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2011

Citing 12
Cited 8

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Made-up minds: a constructivist approach to artificial intelligence

Made-up minds: a constructivist approach to artificial intelligence
Map learning with uninterpreted sensors and effectors

Artificial Intelligence
CHILD: A First Step Towards Continual Learning

Machine Learning - Special issue on inductive transfer
Neo: learning conceptual knowledge by sensorimotor interaction with an environment

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A multimodal learning interface for grounding spoken language in sensory perceptions

ACM Transactions on Applied Perception (TAP)
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning symbolic models of stochastic domains

Journal of Artificial Intelligence Research

Beyond reward: the problem of knowledge and data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Neuroevolution results in emergence of short-term memory in multi-goal environment

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Better generalization with forecasts

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Extending sensorimotor contingency theory: prediction, planning, and action generation

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
The arcade learning environment: an evaluation platform for general agents

Journal of Artificial Intelligence Research
Multi-timescale nexting in a reinforcement learning robot

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maintaining accurate world knowledge in a complex and changing environment is a perennial problem for robots and other artificial intelligence systems. Our architecture for addressing this problem, called Horde, consists of a large number of independent reinforcement learning sub-agents, or demons. Each demon is responsible for answering a single predictive or goal-oriented question about the world, thereby contributing in a factored, modular way to the system's overall knowledge. The questions are in the form of a value function, but each demon has its own policy, reward function, termination function, and terminal-reward function unrelated to those of the base problem. Learning proceeds in parallel by all demons simultaneously so as to extract the maximal training information from whatever actions are taken by the system as a whole. Gradient-based temporal-difference learning methods are used to learn efficiently and reliably with function approximation in this off-policy setting. Horde runs in constant time and memory per time step, and is thus suitable for learning online in real-time applications such as robotics. We present results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience. Horde is a significant incremental step towards a real-time architecture for efficient learning of general knowledge from unsupervised sensorimotor interaction.