Reinforcement learning for robots using neural networks
Reinforcement learning for robots using neural networks
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning
Machine Learning
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
Off-Policy Temporal Difference Learning with Function Approximation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement Learning Applied to Linear Quadratic Regulation
Advances in Neural Information Processing Systems 5, [NIPS Conference]
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
An approach to fuzzy control of nonlinear systems: stability and design issues
IEEE Transactions on Fuzzy Systems
Proto-value functions: developmental reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
Fast direct policy evaluation using multiscale analysis of Markov diffusion processes
ICML '06 Proceedings of the 23rd international conference on Machine learning
Kernel rewards regression: an information efficient batch policy iteration approach
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
A robust Markov game controller for nonlinear systems
Applied Soft Computing
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
Constructing basis functions from directed graphs for value function approximation
Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation
Proceedings of the 24th international conference on Machine learning
Proceedings of the 24th international conference on Machine learning
Batch reinforcement learning in a complex domain
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Model-based function approximation in reinforcement learning
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
IFSA: incremental feature-set augmentation for reinforcement learning tasks
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning
International Journal of Robotics Research
Proceedings of the 25th international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
A semiparametric statistical approach to model-free policy evaluation
Proceedings of the 25th international conference on Machine learning
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Rollout sampling approximate policy iteration
Machine Learning
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Transfer of task representation in reinforcement learning using policy-based proto-value functions
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Geodesic Gaussian kernels for value function approximation
Autonomous Robots
Reinforcement Learning in Nonstationary Environment Navigation Tasks
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Policy Iteration for Learning an Exercise Policy for American Options
Recent Advances in Reinforcement Learning
Learning While Optimizing an Unknown Fitness Surface
Learning and Intelligent Optimization
Factored value iteration converges
Acta Cybernetica
Letters: On the bias of batch Bellman residual minimisation
Neurocomputing
Regularization and feature selection in least-squares temporal difference learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Binary action search for learning continuous-action control policies
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Constraint relaxation in approximate linear programs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
An empirical analysis of value function-based and policy search reinforcement learning
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A task specification language for bootstrap learning
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Reinforcement Learning Control of a Real Mobile Robot Using Approximate Policy Iteration
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Least Squares SVM for Least Squares TD Learning
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Multi-Agent Least-Squares Policy Iteration
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A Simulation-based Approach for Solving Generalized Semi-Markov Decision Processes
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Reinforcement Learning and Reactive Search: an adaptive MAX-SAT solver
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Reinforcement learning for robot soccer
Autonomous Robots
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
Learning representation and control in continuous Markov decision processes
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Sample-efficient evolutionary function approximation for reinforcement learning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hybrid least-squares algorithms for approximate policy evaluation
Machine Learning
Compositional Models for Reinforcement Learning
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
ACM SIGGRAPH Asia 2009 papers
Samuel meets Amarel: automating value function approximation using global state space analysis
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Error bounds for approximate value iteration
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Compact spectral bases for value function approximation using Kronecker factorization
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Towards faster planning with continuous resources in stochastic domains
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Adaptive importance sampling with automatic model selection in value function approximation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Perseus: randomized point-based value iteration for POMDPs
Journal of Artificial Intelligence Research
Closed-loop learning of visual control policies
Journal of Artificial Intelligence Research
Learning and multiagent reasoning for autonomous agents
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A fast analytical algorithm for solving Markov decision processes with real-valued resources
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
An Additive Reinforcement Learning
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Approximate dynamic programming using Bellman residual elimination and Gaussian process regression
ACC'09 Proceedings of the 2009 conference on American Control Conference
Least absolute policy iteration for robust value function approximation
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Fuzzy decision tree function approximation in reinforcement learning
International Journal of Artificial Intelligence and Soft Computing
Transfer Learning for Reinforcement Learning Domains: A Survey
The Journal of Machine Learning Research
Provably Efficient Learning with Typed Parametric Models
The Journal of Machine Learning Research
Model-based exploration in continuous state spaces
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Approximate dynamic programming with a fuzzy parameterization
Automatica (Journal of IFAC)
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Autonomous Agents and Multi-Agent Systems
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
EA'09 Proceedings of the 9th international conference on Artificial evolution
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Incorporating domain models into Bayesian optimization for RL
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Reducing reinforcement learning to KWIK online regression
Annals of Mathematics and Artificial Intelligence
Revisiting natural actor-critics with value function approximation
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Sparse approximate dynamic programming for dialog management
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Journal of Artificial Intelligence Research
Empowerment for continuous agent-environment systems
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Sample-efficient batch reinforcement learning for dialogue management optimization
ACM Transactions on Speech and Language Processing (TSLP)
Optimization of heuristic search using recursive algorithm selection and reinforcement learning
Annals of Mathematics and Artificial Intelligence
Adaptive kernel-width selection for kernel-based least-squares policy iteration algorithm
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Exploiting Best-Match Equations for Efficient Reinforcement Learning
The Journal of Machine Learning Research
User to user QoE routing system
WWIC'11 Proceedings of the 9th IFIP TC 6 international conference on Wired/wireless internet communications
The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Robust Approximate Bilinear Programming for Value Function Approximation
The Journal of Machine Learning Research
Approximate policy iteration for closed-loop learning of visual tasks
ECML'06 Proceedings of the 17th European conference on Machine Learning
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning
ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Analysis and improvement of policy gradient estimation
Neural Networks
ECML'05 Proceedings of the 16th European conference on Machine Learning
Plan b: uncertainty/time trade-offs for linear and integer programming
CPAIOR'06 Proceedings of the Third international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Computational Intelligence
Monte-Carlo swarm policy search
SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation
Q-error as a selection mechanism in modular reinforcement-learning systems
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Reinforcement learning with a bilinear q function
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Batch, off-policy and model-free apprenticeship learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce for parallel reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Reinforcement learning transfer via sparse coding
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A novel feature sparsification method for kernel-based approximate policy iteration
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
A rapid sparsification method for kernel machines in approximate policy iteration
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An online kernel-based clustering approach for value function approximation
SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
APRIL: active preference learning-based reinforcement learning
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Policy iteration based on a learned transition model
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Adaptive reservoir computing through evolution and learning
Neurocomputing
Reinforcement learning transfer using a sparse coded inter-task mapping
EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
Using approximate dynamic programming to optimize admission control in cloud computing environment
Proceedings of the Winter Simulation Conference
Modular value iteration through regional decomposition
AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
A hierarchical representation policy iteration algorithm for reinforcement learning
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Performance bounds for λ policy iteration and application to the game of Tetris
The Journal of Machine Learning Research
Finite-sample analysis of least-squares policy iteration
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Probabilistic planning for continuous dynamic systems under bounded risk
Journal of Artificial Intelligence Research
Reward shaping for statistical optimisation of dialogue management
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Linear Bayesian reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Computational Statistics & Data Analysis
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Construction of approximation spaces for reinforcement learning
The Journal of Machine Learning Research
Journal of Intelligent and Robotic Systems
Policy oscillation is overshooting
Neural Networks
Hi-index | 0.00 |
We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares temporal-difference learning algorithm (LSTD) for prediction problems, which is known for its efficient use of sample experiences compared to pure temporal-difference algorithms. Heretofore, LSTD has not had a straightforward application to control problems mainly because LSTD learns the state value function of a fixed policy which cannot be used for action selection and control without a model of the underlying process. Our new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework. LSPI is a model-free, off-policy method which can use efficiently (and reuse in each iteration) sample experiences collected in any manner. By separating the sample collection method, the choice of the linear approximation architecture, and the solution method, LSPI allows for focused attention on the distinct elements that contribute to practical reinforcement learning. LSPI is tested on the simple task of balancing an inverted pendulum and the harder task of balancing and riding a bicycle to a target location. In both cases, LSPI learns to control the pendulum or the bicycle by merely observing a relatively small number of trials where actions are selected randomly. LSPI is also compared against Q-learning (both with and without experience replay) using the same value function architecture. While LSPI achieves good performance fairly consistently on the difficult bicycle task, Q-learning variants were rarely able to balance for more than a small fraction of the time needed to reach the target location.