Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

Authors:
David Chapman;Leslie Pack Kaelbling
Affiliations:
Teleos Research, Palo Alto, CA;Teleos Research, Palo Alto, CA
Venue:
IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
Year:
1991

Citing 12
Cited 23

Intelligence as adaptive behavior: an experiment in computational neuroethology

Intelligence as adaptive behavior: an experiment in computational neuroethology
Minimalist mobile robotics: a colony-style architecture for an artificial creature

Minimalist mobile robotics: a colony-style architecture for an artificial creature
Active perception and reinforcement learning

Proceedings of the seventh international conference (1990) on Machine learning
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Vision, instruction, and action

Vision, instruction, and action
Made-up minds: a constructivist approach to artificial intelligence

Made-up minds: a constructivist approach to artificial intelligence
Self-improving reactive agents: case studies of reinforcement learning frameworks

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Learning in embedded systems

Learning in embedded systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Induction of Decision Trees

Machine Learning
Learning and Sequential Decision Making

Learning and Sequential Decision Making
A robot that walks; emergent behaviors from a carefully evolved network

Neural Computation

Tree based discretization for continuous state space reinforcement learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Q-Learning with Adaptive State Space Construction

EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
The Lumberjack Algorithm for Learning Linked Decision Forests

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
TTree: Tree-Based State Generalization with Temporally Abstract Actions

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Relational Reinforcement Learning

EASSS '01 Selected Tutorial Papers from the 9th ECCAI Advanced Course ACAI 2001 and Agent Link's 3rd European Agent Systems Summer School on Multi-Agent Systems and Applications
VQQL. Applying Vector Quantization to Reinforcement Learning

RoboCup-99: Robot Soccer World Cup III
Logic, Knowledge Representation, and Bayesian Decision Theory

CL '00 Proceedings of the First International Conference on Computational Logic
Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling

ECML '07 Proceedings of the 18th European conference on Machine Learning
Reinforcement Learning in Complex Environments Through Multiple Adaptive Partitions

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Learning to act using real-time dynamic programming

Artificial Intelligence
A dynamical systems perspective on agent-environment interaction

Artificial Intelligence
Relational reinforcement learning applied to shared attention

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
The lumberjack algorithm for learning linked decision forests

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Unified criterion of state generalization for reactive autonomous agents

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
TTree: tree-based state generalization with temporally abstract actions

Adaptive agents and multi-agent systems
Relational reinforcement learning for agents in worlds with objects

Adaptive agents and multi-agent systems
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Reinforcement learning with a hierarchy of abstract models

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Generalized learning automata for multi-agent reinforcement learning

AI Communications - European Workshop on Multi-Agent Systems (EUMAS) 2009
Multivariate decision tree function approximation for reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
An anytime algorithm for decision making under uncertainty

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Correlated action effects in decision theoretic regression

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of back-propagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did.