Airline seat allocation with multiple nested fare classes
Operations Research
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Asynchronous Stochastic Approximations
SIAM Journal on Control and Optimization
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Reinforcement Learning
Simulation Modeling and Analysis
Simulation Modeling and Analysis
Neuro-Dynamic Programming
Multiperiod Airline Overbooking with a Single Fare Class
Operations Research
Airline Yield Management with Overbooking, Cancellations, and No-Shows
Transportation Science
Revenue Management: Research Overview and Prospects
Transportation Science
Algorithms for sequential decision-making
Algorithms for sequential decision-making
An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results
An average-reward reinforcement learning algorithm for computing bias-optimal policies
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning
Proceedings of the 40th Conference on Winter Simulation
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Computers and Operations Research
Compound reinforcement learning: theory and an application to finance
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Hi-index | 0.00 |
We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronous, model-free algorithm (which can be used on large-scale problems) that hinges on the idea of computing the value function of a given policy and searching over policy space. In the applied operations research community, RL has been used to derive good solutions to problems previously considered intractable. Hence in this paper, we have tested the proposed algorithm on a commercially significant case study related to a real-world problem from the airline industry. It focuses on yield management, which has been hailed as the key factor for generating profits in the airline industry. In the experiments conducted, we use our algorithm with a nearest-neighbor approach to tackle a large state space. We also present a convergence analysis of the algorithm via an ordinary differential equation method.