Multi-timescale nexting in a reinforcement learning robot

Authors:
Joseph Modayil;Adam White;Richard S Sutton
Affiliations:
Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Canada;Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Canada;Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Canada
Venue:
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Year:
2014

Citing 25
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Made-up minds: a constructivist approach to artificial intelligence

Made-up minds: a constructivist approach to artificial intelligence
Map learning with uninterpreted sensors and effectors

Artificial Intelligence
System identification (2nd ed.): theory for the user

System identification (2nd ed.): theory for the user
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Model Predictive Control in the Process Industry

Model Predictive Control in the Process Industry
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning and Computational Neuroscience: Foundations of Adaptive Networks

Learning and Computational Neuroscience: Foundations of Adaptive Networks
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
On Intelligence

On Intelligence
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Tickling Expectations: Neural Processing in Anticipation of a Sensory Stimulus

Journal of Cognitive Neuroscience
Stanley: The robot that won the DARPA Grand Challenge: Research Articles

Journal of Robotic Systems - Special Issue on the DARPA Grand Challenge, Part 2
Planning Algorithms

Planning Algorithms
Natural Actor-Critic

Neurocomputing
Coordinating with the Future: The Anticipatory Nature of Representation

Minds and Machines
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Reinforcement learning with a hierarchy of abstract models

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Closing the learning-planning loop with predictive state representations

International Journal of Robotics Research
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Intrinsic Motivation Systems for Autonomous Mental Development

IEEE Transactions on Evolutionary Computation
Beyond reward: the problem of knowledge and data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Gradient temporal-difference learning algorithms

Gradient temporal-difference learning algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The term 'nexting' has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to 'next' constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds. Our predictions are formulated as a generalization of the value functions commonly used in reinforcement learning, where now an arbitrary function of the sensory input signals is used as a pseudo reward, and the discount rate determines the timescale. We show that six thousand predictions, each computed as a function of six thousand features of the state, can be learned and updated online ten times per second on a laptop computer, using the standard temporal-difference(脦禄) algorithm with linear function approximation. This approach is sufficiently computationally efficient to be used for real-time learning on the robot and sufficiently data efficient to achieve substantial accuracy within 30 minutes. Moreover, a single tile-coded feature representation suffices to accurately predict many different signals over a significant range of timescales. We also extend nexting beyond simple timescales by letting the discount rate be a function of the state and show that nexting predictions of this more general form can also be learned with substantial accuracy. General nexting provides a simple yet powerful mechanism for a robot to acquire predictive knowledge of the dynamics of its environment.