Auto-exploratory average reward reinforcement learning

Authors:
DoKyeong Ok;Prasad Tadepalli
Affiliations:
Computer Science Department, Oregon State University, Corvallis, Oregon;Computer Science Department, Oregon State University, Corvallis, Oregon
Venue:
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Year:
1996

Citing 12
Cited 1

Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Learning in embedded systems

Learning in embedded systems
Reinforcement learning algorithms for average-payoff Markovian decision processes

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward
Learning to act using real-time dynamic programming

Artificial Intelligence
An average-reward reinforcement learning algorithm for computing bias-optimal policies

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Hierarchical Average Reward Reinforcement Learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-learning" performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration.