Policy iteration based on a learned transition model

Authors:
Vivek Ramavajjala;Charles Elkan
Affiliations:
Department of Computer Science & Engineering, University of California, San Diego, CA;Department of Computer Science & Engineering, University of California, San Diego, CA
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Year:
2012

Citing 13
Cited 1

Reinforcement learning architectures for animats

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Feature Article---Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming

INFORMS Journal on Computing
Commentary---Perspectives on Stochastic Optimization Over Time

INFORMS Journal on Computing
An approach to fuzzy control of nonlinear systems: stability and design issues

IEEE Transactions on Fuzzy Systems
Reinforcement learning with a bilinear q function

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

A reinforcement learning approach to autonomous decision-making in smart electricity markets

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal state-action value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and action. This approach makes it easier to define useful basis functions, and hence to learn a useful linear approximation of the value function. Experiments show that the new algorithm, called NSPI for next-state policy iteration, performs well on two standard benchmarks, the well-known mountain car and inverted pendulum swing-up tasks. More importantly, the NSPI algorithm performs well, and better than a specialized recent method, on a resource management task known as the day-ahead wind commitment problem. This latter task has action and state spaces that are high-dimensional and continuous.