An ICA-Based multivariate discretization algorithm

  • Authors:
  • Ye Kang;Shanshan Wang;Xiaoyan Liu;Hokyin Lai;Huaiqing Wang;Baiqi Miao

  • Affiliations:
  • Department of Information Systems, City University of Hong Kong;Department of Information Systems, City University of Hong Kong;Department of Information Systems, City University of Hong Kong;Department of Information Systems, City University of Hong Kong;Department of Information Systems, City University of Hong Kong;Management School, University of Science and Technology of China, HeFei, AnHui Province

  • Venue:
  • KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discretization is an important preprocessing technique in data mining tasks. Univariate Discretization is the most commonly used method. It discretizes only one single attribute of a dataset at a time, without considering the interaction information with other attributes. Since it is multi-attribute rather than one single attribute determines the targeted class attribute, the result of Univariate Discretization is not optimal. In this paper, a new Multivariate Discretization algorithm is proposed. It uses ICA (Independent Component Analysis) to transform the original attributes into an independent attribute space, and then apply Univariate Discretization to each attribute in the new space. Data mining tasks can be conducted in the new discretized dataset with independent attributes. The numerical experiment results show that our method improves the discretization performance, especially for the nongaussian datasets, and it is competent compared to PCA-based multivariate method.