Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

  • Authors:
  • Yan Li;Edward Hung;Korris Chung;Joshua Huang

  • Affiliations:
  • Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;E-Business Technology Institute, The University of Hong Kong, Hong Kong, China

  • Venue:
  • AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a new classification method (ADCC) for high dimensional data is proposed. In this method, a decision cluster classification model (DCC) consists of a set of disjoint decision clusters, each labeled with a dominant class that determines the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a variable weighting k -means algorithm. Then, the DCC model is selected from the tree. Anderson-Darling test is used to determine the stopping condition of the tree growing. A series of experiments on both synthetic and real data sets have shown that the new classification method (ADCC) performed better in accuracy and scalability than the existing methods of k -NN , decision tree and SVM. It is particularly suitable for large, high dimensional data with many classes.