Passenger-based predictive modeling of airline no-show rates

Authors:
Richard D. Lawrence;Se June Hong;Jacques Cherrier
Affiliations:
IBM T. J. Watson Research Ctr, Yorktown Heights, NY;IBM T. J. Watson Research Ctr, Yorktown Heights, NY;Air Canada, Dorval, Quebec
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 7
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining techniques to improve forecast accuracy in airline business

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Revenue Management: Research Overview and Prospects

Transportation Science
Ensemble Modeling Through Multiplicative Adjustment of Class Probability

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A probabilistic estimation framework for predictive modeling analytics

IBM Systems Journal

Extending online travel agency with adaptive reservations

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Regret in Overbooking and Fare-Class Allocation for Single Leg

Manufacturing & Service Operations Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Airlines routinely overbook flights based on the expectation that some fraction of booked passengers will not show for each flight. Accurate forecasts of the expected number of no-shows for each flight can increase airline revenue by reducing the number of spoiled seats (empty seats that might otherwise have been sold) and the number of involuntary denied boardings at the departure gate. Conventional no-show forecasting methods typically average the no-show rates of historically similar flights, without the use of passenger-specific information.We develop two classes of models to predict cabin-level no-show rates using specific information on the individual passengers booked on each flight. The first of these models computes the no-show probability for each passenger, using both the cabin-level historical forecast and the extracted passenger features as explanatory variables. This passenger-level model is implemented using three different predictive methods: a C4.5 decision-tree, a segmented Naive Bayes algorithm, and a new aggregation method for an ensemble of probabilistic models. The second cabin-level model is formulated using the desired cabin-level no-show rate as the response variable. Inputs to this model include the predicted cabin-level no-show rates derived from the various passenger-level models, as well as simple statistics of the features of the cabin passenger population. The cabin-level model is implemented using either linear regression, or as a direct probability model with explicit incorporation of the cabin-level no-show rates derived from the passenger-level model outputs.The new passenger-based models are compared to a conventional historical model, using train and evaluation data sets taken from over 1 million passenger name records. Standard metrics such as lift curves and mean-square cabin-level errors establish the improved accuracy of the passenger-based models over the historical model. All models are also evaluated using a simple revenue model, and it is shown that the cabin-level passenger-based model can produce between 0.4% and 3.2% revenue gain over the conventional model, depending on the revenue-model parameters.