A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Authors:
Abhijit Gosavi
Affiliations:
Department of Industrial Engineering, The State University of New York at Buffalo, 342 Bell Hall Box 602050, Buffalo, NY 14260-2050, USA. agosavi@buffalo.edu
Venue:
Machine Learning
Year:
2004

Citing 18
Cited 5

An airline seat management model for a single leg route when lower fare classes book first

Operations Research
Airline seat allocation with multiple nested fare classes

Operations Research
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Asynchronous Stochastic Approximations

SIAM Journal on Control and Optimization
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Reinforcement Learning

Reinforcement Learning
Simulation Modeling and Analysis

Simulation Modeling and Analysis
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Multiperiod Airline Overbooking with a Single Fare Class

Operations Research
Airline Yield Management with Overbooking, Cancellations, and No-Shows

Transportation Science
Revenue Management: Research Overview and Prospects

Transportation Science
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Management Science
Algorithms for sequential decision-making

Algorithms for sequential decision-making
An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results

An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results
Revenue Management Without Forecasting or Optimization: An Adaptive Algorithm for Determining Airline Seat Protection Levels

Management Science
An average-reward reinforcement learning algorithm for computing bias-optimal policies

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning

Proceedings of the 40th Conference on Winter Simulation
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

Computers and Operations Research
Compound reinforcement learning: theory and an application to finance

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronous, model-free algorithm (which can be used on large-scale problems) that hinges on the idea of computing the value function of a given policy and searching over policy space. In the applied operations research community, RL has been used to derive good solutions to problems previously considered intractable. Hence in this paper, we have tested the proposed algorithm on a commercially significant case study related to a real-world problem from the airline industry. It focuses on yield management, which has been hailed as the key factor for generating profits in the airline industry. In the experiments conducted, we use our algorithm with a nearest-neighbor approach to tackle a large state space. We also present a convergence analysis of the algorithm via an ordinary differential equation method.