Passenger-based predictive modeling of airline no-show rates

  • Authors:
  • Richard D. Lawrence;Se June Hong;Jacques Cherrier

  • Affiliations:
  • IBM T. J. Watson Research Ctr, Yorktown Heights, NY;IBM T. J. Watson Research Ctr, Yorktown Heights, NY;Air Canada, Dorval, Quebec

  • Venue:
  • Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Airlines routinely overbook flights based on the expectation that some fraction of booked passengers will not show for each flight. Accurate forecasts of the expected number of no-shows for each flight can increase airline revenue by reducing the number of spoiled seats (empty seats that might otherwise have been sold) and the number of involuntary denied boardings at the departure gate. Conventional no-show forecasting methods typically average the no-show rates of historically similar flights, without the use of passenger-specific information.We develop two classes of models to predict cabin-level no-show rates using specific information on the individual passengers booked on each flight. The first of these models computes the no-show probability for each passenger, using both the cabin-level historical forecast and the extracted passenger features as explanatory variables. This passenger-level model is implemented using three different predictive methods: a C4.5 decision-tree, a segmented Naive Bayes algorithm, and a new aggregation method for an ensemble of probabilistic models. The second cabin-level model is formulated using the desired cabin-level no-show rate as the response variable. Inputs to this model include the predicted cabin-level no-show rates derived from the various passenger-level models, as well as simple statistics of the features of the cabin passenger population. The cabin-level model is implemented using either linear regression, or as a direct probability model with explicit incorporation of the cabin-level no-show rates derived from the passenger-level model outputs.The new passenger-based models are compared to a conventional historical model, using train and evaluation data sets taken from over 1 million passenger name records. Standard metrics such as lift curves and mean-square cabin-level errors establish the improved accuracy of the passenger-based models over the historical model. All models are also evaluated using a simple revenue model, and it is shown that the cabin-level passenger-based model can produce between 0.4% and 3.2% revenue gain over the conventional model, depending on the revenue-model parameters.