Optimal control: linear quadratic methods
Optimal control: linear quadratic methods
ALVINN: an autonomous land vehicle in a neural network
Advances in neural information processing systems 1
ML92 Proceedings of the ninth international workshop on Machine learning
Reinforcement Learning
Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Practical Reinforcement Learning in Continuous Spaces
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning Movement Sequences from Demonstration
ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Qualitative reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
Autonomous agent learning using an actor-critic algorithm and behavior models
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Proposal of Exploitation-Oriented Learning PS-r#
IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Probabilistic Inference for Fast Learning in Control
Recent Advances in Reinforcement Learning
A survey of robot learning from demonstration
Robotics and Autonomous Systems
Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans
Creating Brain-Like Intelligence
Neuroevolutionary reinforcement learning for generalized helicopter control
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Transfer Learning for Reinforcement Learning Domains: A Survey
The Journal of Machine Learning Research
Provably Efficient Learning with Typed Parametric Models
The Journal of Machine Learning Research
Autonomous Helicopter Aerobatics through Apprenticeship Learning
International Journal of Robotics Research
Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot
Robotics and Autonomous Systems
Policy adaptation with tactile feedback
Proceedings of the 6th international conference on Human-robot interaction
Reinforcement learning and apprenticeship learning for robotic control
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Improvement of systems management policies using hybrid reinforcement learning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Tactile Guidance for Policy Adaptation
Foundations and Trends in Robotics
ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
Apprenticeship learning with few examples
Neurocomputing
Information Sciences: an International Journal
Journal of Intelligent and Robotic Systems
Hi-index | 0.00 |
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses.