Variable Resolution Discretization in Optimal Control

Authors:
Rémi Munos;Andrew Moore
Affiliations:
Centre de Mathématiques Appliquées, Ecole Polytechnique, 91128 Palaiseau, France. remi.munos@polytechnique.fr (http://www.cmap.polytechnique.fr/~munos/);Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA. awm@cs.cmu.edu
Venue:
Machine Learning
Year:
2002

Citing 19
Cited 39

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Random number generation and quasi-Monte Carlo methods

Random number generation and quasi-Monte Carlo methods
Learning in embedded systems

Learning in embedded systems
Numerical methods for stochastic control problems in continuous time

Numerical methods for stochastic control problems in continuous time
Simplicial mesh generation with applications

Simplicial mesh generation with applications
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Temporal difference learning and TD-Gammon

Communications of the ACM
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Machine Learning
Rates of Convergence for Approximation Schemes in Optimal Control

SIAM Journal on Control and Optimization
Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences

Computing
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Barycentric interpolators for continuous space & time reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

Machine Learning
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Rates of Convergence for Variable Resolution Schemes in Optimal Control

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Approximate solutions to markov decision processes

Approximate solutions to markov decision processes

Lyapunov design for safe reinforcement learning

The Journal of Machine Learning Research
P3VI: a partitioned, prioritized, parallel value iterator

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic programming for structured continuous Markov decision problems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Interactive learning of mappings from visual percepts to actions

ICML '05 Proceedings of the 22nd international conference on Machine learning
Incremental Learning of Linear Model Trees

Machine Learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive spline interpolation for Hamilton-Jacobi-Bellman equations

Applied Numerical Mathematics - Numerical methods for viscosity solutions and applications
Asset pricing with dynamic programming

Computational Economics
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
Application of SONQL for real-time learning of robot behaviors

Robotics and Autonomous Systems
Continuous State Dynamic Programming via Nonexpansive Approximation

Computational Economics
Accelerating autonomous learning by using heuristic selection of actions

Journal of Heuristics
Reinforcement Learning in Complex Environments Through Multiple Adaptive Partitions

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Multigrid Reinforcement Learning with Reward Shaping

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Two Steps Reinforcement Learning in Continuous Reinforcement Learning Tasks

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Motion Planning of a Non-holonomic Vehicle in a Real Environment by Reinforcement Learning*

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Metastable Walking Machines

International Journal of Robotics Research
Compact character controllers

ACM SIGGRAPH Asia 2009 papers
Lazy approximation for solving continuous finite-horizon MDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Closed-loop learning of visual control policies

Journal of Artificial Intelligence Research
A heuristic search approach to planning with continuous resources in stochastic domains

Journal of Artificial Intelligence Research
Adaptive Fuzzy Function Approximation for Multi-agent Reinforcement Learning

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Domain-independent, automatic partitioning for probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Approximate dynamic programming with a fuzzy parameterization

Automatica (Journal of IFAC)
Finding and transferring policies using stored behaviors

Autonomous Robots
Case-Based Multiagent Reinforcement Learning: Cases as Heuristics for Selection of Actions

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Continuous-state reinforcement learning with fuzzy approximation

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Planning in stochastic domains for multiple agents with individual continuous resource state-spaces

Autonomous Agents and Multi-Agent Systems
A geometric approach to find nondominated policies to imprecise reward MDPs

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Task-Driven discretization of the joint space of visual percepts and continuous actions

ECML'06 Proceedings of the 17th European conference on Machine Learning
Optimal motion planning by reinforcement learning in autonomous mobile vehicles

Robotica
When do differences matter? On-line feature extraction through cognitive economy

Cognitive Systems Research
Reinforcement Learning with Reward Shaping and Mixed Resolution Function Approximation

International Journal of Agent Technologies and Systems
Deconstructing reinforcement learning in sigma

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.