Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Authors:
Tapas K. Das;Abhijit Gosavi;Sridhar Mahadevan;Nicholas Marchalleck
Affiliations:
-;-;-;-
Venue:
Management Science
Year:
1999

Citing 0
Cited 20

Decision making using simulation: solving sequential decision-making problems under virtual reality simulation system

Proceedings of the 33nd conference on Winter simulation
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
Representation and timing in theories of the dopamine system

Neural Computation
Application of reinforcement learning to the game of Othello

Computers and Operations Research
Simulation-optimization using a reinforcement learning approach

Proceedings of the 40th Conference on Winter Simulation
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
A neurocomputational model for cocaine addiction

Neural Computation
An RL-based scheduling algorithm for video traffic in high-rate wireless personal area networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Learning and adaptation of a policy for dynamic order acceptance in make-to-order manufacturing

Computers and Industrial Engineering
Application of reinforcement learning for agent-based production scheduling

Engineering Applications of Artificial Intelligence
Real-valued Q-learning in multi-agent cooperation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Approximate dynamic programming for an inventory problem: Empirical comparison

Computers and Industrial Engineering
Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

Computers and Operations Research
Performance bounds for mobile cellular networks with handover prediction

MMNS'05 Proceedings of the 8th international conference on Management of Multimedia Networks and Services
An afterstates reinforcement learning approach to optimize admission control in mobile cellular networks

EURO-NGI'05 Proceedings of the Second international conference on Wireless Systems and Network Architectures in Next Generation Internet
Induced states in a decision tree constructed by Q-learning

Information Sciences: an International Journal
Robustness of optimal channel reservation using handover prediction in multiservice wireless networks

Wireless Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, and obtaining these is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic channel allocation of cellular telephone systems. In this paper, we extend RL to a more general class of decision tasks that are referred to as semi-Markov decision problems (SMDPs). In particular, we focus on SMDPs under the average-reward criterion. We present a new model-free RL algorithm called SMART (Semi-Markov Average Reward Technique). We present a detailed study of this algorithm on a combinatorially large problem of determining the optimal preventive maintenance schedule of a production inventory system. Numerical results from both the theoretical model and the RL algorithm are presented and compared.