Scaling model-based average-reward reinforcement learning for product delivery

Authors:
Scott Proper;Prasad Tadepalli
Affiliations:
Oregon State University, Corvallis, OR;Oregon State University, Corvallis, OR
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 7
Cited 4

Model-based average reward reinforcement learning

Artificial Intelligence
Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands

Computers and Operations Research - Neural networks in business
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands

Operations Research
Learning to Communicate and Act Using Hierarchical Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3

Solving multiagent assignment Markov decision processes

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Transfer learning via relational templates

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Robustness of optimal channel reservation using handover prediction in multiservice wireless networks

Wireless Networks
Modeling difference rewards for multiagent learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery – an optimization problem that combines inventory control and vehicle routing.