On the Existence and Significance of Data Preprocessing Biases in Web-Usage Mining

  • Authors:
  • Zhiqiang Zheng;Balaji Padmanabhan;Steven O. Kimbrough

  • Affiliations:
  • -;-;-

  • Venue:
  • INFORMS Journal on Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The literature on web-usage mining is replete with data preprocessing techniques, which correspond to many closely related problem formulations. We survey data preprocessing techniques for session-level pattern discovery and compare three of these techniques in the context of understanding session-level purchase behavior on the web. Using real data collected from 20,000 users browsing behavior over a period of six months, four different models (linear regressions, logistic regressions, neural networks, and classification trees) are built based on data preprocessed using three different techniques. The results demonstrate that the three approaches result in radically different conclusions and provide initial evidence that adata preprocessing bias exists, the effect of which can be significant.