A general lower bound on the number of examples needed for learning
Information and Computation
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
Further explorations in visually-guided reaching: making MURPHY smarter
Advances in neural information processing systems 1
Acquisition of dynamic control knowledge for a robotic manipulator
Proceedings of the seventh international conference (1990) on Machine learning
Neural networks for control
Generalization and scaling in reinforcement learning
Advances in neural information processing systems 2
Probably approximate learning of sets and functions
SIAM Journal on Computing
COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Technical Note: \cal Q-Learning
Machine Learning
Associative Reinforcement Learning: Functions in k-DNF
Machine Learning
Characterizations of learnability for classes of {0, …, n}-valued functions
Journal of Computer and System Sciences
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Iterated phantom induction: a little knowledge can go a long way
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Exact identification of circuits using fixed points of amplification functions
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Provably bounded optimal agents
IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 1
Hi-index | 0.00 |
We consider the problem of learning an effective behavior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue.We investigate the inherent data-complexity of behavior-learning when the goal is simply to optimize immediate reward. Although easier than reinforcement learning, where one must also cope with state dynamics, immediate reward learning is still a common problem and is fundamentally harder than supervised learning.For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possible reward models, or a bias on the space of possible controllers. We investigate the two paradigmatic learning approaches of indirect (reward-model) learning and direct-control learning, and show that neither uniformly dominates the other in general. Model-based learning has the advantage of generalizing reward experiences across states and actions, but direct-control learning has the advantage of focusing only on potentially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advantageous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning efficiency.