Construction of approximation spaces for reinforcement learning

Authors:
Wendelin Böhmer;Steffen Grünewälder;Yun Shen;Marek Musial;Klaus Obermayer
Affiliations:
Neural Information Processing Group, Technische Universität Berlin, Berlin, Germany;Centre for Computational Statistics and Machine Learning, University College London, London, United Kingdom;Neural Information Processing Group, Technische Universität Berlin, Berlin, Germany;Robotics Group, Technische Universität Berlin, Berlin, Germany;Neural Information Processing Group, Technische Universität Berlin, Berlin, Germany
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 47
Cited 0

Estimating uncertain spatial relationships in robotics

Autonomous robot vehicles
Learning invariance from transformation sequences

Neural Computation
Technical Note: \cal Q-Learning

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Sparse on-line Gaussian processes

Neural Computation
Slow feature analysis: unsupervised learning of invariances

Neural Computation
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Slow feature analysis: a theoretical analysis of optimal free responses

Neural Computation
Real-Time Simultaneous Localisation and Mapping with a Single Camera

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Least-squares policy iteration

The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Convex Optimization

Convex Optimization
A spatio-temporal extension to Isomap nonlinear dimension reduction

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Solving factored MDPs with continuous and discrete variables

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Action respecting embedding

ICML '05 Proceedings of the 22nd international conference on Machine learning
A fast learning algorithm for deep belief nets

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
On discovery and learning of models with predictive representations of state for agents with continuous actions and observations

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Transfer of task representation in reinforcement learning using policy-based proto-value functions

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Transfer in variable-reward hierarchical reinforcement learning

Machine Learning
Closed-loop learning of visual control policies

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Computing factored value functions for policies in structured MDPs

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
An analysis of Laplacian methods for value function approximation in MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Predictive projections

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
The optimal unbiased value estimator and its relation to LSTD, TD and MC

Machine Learning
Policy iteration for factored MDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Robust Approximate Bilinear Programming for Value Function Approximation

The Journal of Machine Learning Research
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
On the relation of slow feature analysis and laplacian eigenmaps

Neural Computation
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Multi-Task reinforcement learning: shaping and feature selection

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams

Neural Computation
Low complexity proto-value function learning from sensory observations with incremental slow feature analysis

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.