Correlation Preserving Discretization

  • Authors:
  • Sameep Mehta;Srinivasan Parthasarathy;Hui Yang

  • Affiliations:
  • The Ohio State University;The Ohio State University;The Ohio State University

  • Venue:
  • ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discretization is a crucial preprocessing primitive for a variety of data warehousing and mining tasks. In this article we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets. The algorithm leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved. The approach also extends easily to datasets containing missing values. We demonstrate the efficacy of the approach on real datasets and as a preprocessing step for both classification and frequent itemset mining tasks. We also show that the intervals are meaningful and can uncover hidden patterns in data.