Efficient exploration for optimizing immediate reward

Authors:
Dale Schuurmans;Lloyd Greenwald
Affiliations:
-;-
Venue:
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Year:
1999

Citing 20
Cited 0

A general lower bound on the number of examples needed for learning

Information and Computation
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Further explorations in visually-guided reaching: making MURPHY smarter

Advances in neural information processing systems 1
Acquisition of dynamic control knowledge for a robotic manipulator

Proceedings of the seventh international conference (1990) on Machine learning
Neural networks for control

Neural networks for control
Generalization and scaling in reinforcement learning

Advances in neural information processing systems 2
Probably approximate learning of sets and functions

SIAM Journal on Computing
On the complexity of teaching

COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Technical Note: \cal Q-Learning

Machine Learning
Associative Reinforcement Learning: Functions in k-DNF

Machine Learning
Characterizations of learnability for classes of {0, …, n}-valued functions

Journal of Computer and System Sciences
Sequential PAC learning

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Iterated phantom induction: a little knowledge can go a long way

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Exact identification of circuits using fixed points of amplification functions

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Provably bounded optimal agents

IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning an effective behavior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue.We investigate the inherent data-complexity of behavior-learning when the goal is simply to optimize immediate reward. Although easier than reinforcement learning, where one must also cope with state dynamics, immediate reward learning is still a common problem and is fundamentally harder than supervised learning.For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possible reward models, or a bias on the space of possible controllers. We investigate the two paradigmatic learning approaches of indirect (reward-model) learning and direct-control learning, and show that neither uniformly dominates the other in general. Model-based learning has the advantage of generalizing reward experiences across states and actions, but direct-control learning has the advantage of focusing only on potentially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advantageous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning efficiency.