Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

Authors:
Takayuki Akiyama;Hirotaka Hachiya;Masashi Sugiyama
Affiliations:
Department of Computer Science, Tokyo Institute of Technology;Department of Computer Science, Tokyo Institute of Technology;Department of Computer Science, Tokyo Institute of Technology
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 6
Cited 1

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

The Journal of Machine Learning Research
Adaptive importance sampling with automatic model selection in value function approximation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Active learning with statistical models

Journal of Artificial Intelligence Research

Density Ratio Estimation: A New Versatile Tool for Machine Learning

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot.