Ensemble Learning for Imbalanced E-commerce Transaction Anomaly Classification

  • Authors:
  • Haiqin Yang;Irwin King

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong

  • Venue:
  • ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the main results of our on-going work, one month before the deadline, on the 2009 UC San Diego data mining contest. The tasks of the contest are to rank the samples in two e-commerce transaction anomaly datasets according to the probability each sample has a positive label. The performance is evaluated by the lift at 20% on the probability of the two datasets. A main difficulty for the tasks is that the data is highly imbalanced, only about 2% of data are labeled as positive, for both tasks. We first preprocess the data on the categorical features and normalize all the features. Here, we present our initial results on several popular classifiers, including Support Vector Machines, Neural Networks, AdaBoosts, and Logistic Regression. The objective is to get benchmark results of these classifiers without much modification, so it will help us to select a classifier for future tuning. Further, based on these results, we observe that the area under the ROC curve (AUC) is a good indicator to improve the lift score, we then propose an ensemble method to combine the above classifiers aiming at optimizing the AUC score and obtain significant better results. We also discuss with some treatment on the imbalance data in the experiment.