Reinforcement learning algorithms for average-payoff Markovian decision processes
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward
H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward
Process-oriented planning and average-reward optimality
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Preliminary Results
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Analysis of Optimal Criteria
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Auto-exploratory average reward reinforcement learning
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Hi-index | 0.00 |
Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected payoff per step. However, gainoptimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between different policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optimality, which can filter gain-optimal policies to select those that reach absorbing goals with the minimum cost. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more efficiently than a gain-optimal method by learning bias-optimal policies.