An experimental investigation of the impact of aggregation on the performance of data mining with logistic regression

  • Authors:
  • Adam Fadlalla

  • Affiliations:
  • Department of Computer and Information Science, Cleveland State University, Cleveland, OH

  • Venue:
  • Information and Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We studied the impact of data aggregation on the performance of logistic regression on predicting the direction of the Dow Jones industrial average (DJIA) stock market index. Data aggregation is a common operation in business, science, engineering, medicine, etc.; it is performed for purposes such as statistical, financial, and sales and marketing analysis -- particularly within the context of a data warehouse. We showed experimentally that, for this example, as long as aggregation does not shrink the sample size unduly, it does not significantly impair the performance of the logistic regression model for predicting the direction of the DJIA stock market index. We also observed that aggregation-based models are simpler (less over-parameterized) than detail-based models. We used the receiver operating characteristic (ROC) analysis to evaluate the robustness of such predictive models. Specifically, we used the area under the ROC curve as a summary measure of the overall performance of a given model.