Efficient exploration through active learning for value function approximation in reinforcement learning

Authors:
Takayuki Akiyama;Hirotaka Hachiya;Masashi Sugiyama
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan and PRESTO, Japan Science and Technology Agency, Japan
Venue:
Neural Networks
Year:
2010

Citing 12
Cited 3

Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

The Journal of Machine Learning Research
Pool-based active learning in approximate linear regression

Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Active learning with statistical models

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks

Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search

Neural Networks
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Improving importance estimation in pool-based batch active learning for approximate linear regression

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.