Combining a new data classification technique and regression analysis to predict the Cost-To-Serve new customers

Authors:
Estelle R. S. Kone;Mark H. Karwan
Affiliations:
Department of Industrial and Systems Engineering, 438 Bell Hall, University at Buffalo (SUNY), Buffalo, NY 14260, United States;Operations Research, Department of Industrial and Systems Engineering, 438 Bell Hall, University at Buffalo (SUNY), Buffalo, NY 14260, United States
Venue:
Computers and Industrial Engineering
Year:
2011

Citing 9
Cited 2

A massively parallel architecture for a self-organizing neural pattern recognition machine

Computer Vision, Graphics, and Image Processing
Neural-Network-Based Fuzzy Logic Control and Decision System

IEEE Transactions on Computers - Special issue on artificial neural networks
Support-Vector Networks

Machine Learning
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
A modular eigen subspace scheme for high-dimensional data classification

Future Generation Computer Systems - Special issue: Geocomputation
A mixed integer optimisation model for data classification

Computers and Industrial Engineering
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Rapid and brief communication: Design efficient support vector machine for fast classification

Pattern Recognition
Fuzzy min-max neural networks. I. Classification

IEEE Transactions on Neural Networks

DEA based data preprocessing for maximum decisional efficiency linear case valuation models

Expert Systems with Applications: An International Journal
Improving the efficiency of a mixed integer linear programming based approach for multi-class classification problem

Computers and Industrial Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying the Cost-To-Serve (CTS) of customers is one of the most challenging problems in Supply Chain Management because of the diversity in their business activities. For the particular case of the industrial gas business, we are interested in predicting the cost to deliver bulk (liquefied) gas to new customers using a multifactor linear regression model. Developing a single model, i.e. analyzing the observations all at once, produces poor prediction results. Therefore prior to the regression analysis, a new supervised learning technique is used to group customers who are similar in some sense. Classes of customers are represented by hyper-boxes and a linear regression model is subsequently built within each class. The combination of data classification and regression is proven to increase the accuracy of the prediction. Two Mixed-Integer-Linear Programming (MILP) models are developed for data classification purposes. Although we are dealing with a supervised learning method, classes are not predefined in our case. Rather, we input a continuous ''classification'' attribute that is optimally discretized by the MILP's in order to minimize the number of misclassifications. Therefore our data classification model offers a broader range of applications. A number of illustrative examples are used to prove the effectiveness of the proposed approach.