Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

Authors:
Yan Li;Edward Hung;Korris Chung;Joshua Huang
Affiliations:
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China;E-Business Technology Institute, The University of Hong Kong, Hong Kong, China
Venue:
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 5
Cited 3

An Interactive Approach to Building Classification Models by Clustering and Cluster Validation

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
A Visual Method of Cluster Validation with Fastmap

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
What are the grand challenges for data mining?: KDD-2006 panel report

ACM SIGKDD Explorations Newsletter

Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
A subspace decision cluster classifier for text classification

Expert Systems with Applications: An International Journal
An ensemble of decision cluster crotches for classification of high dimensional data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a new classification method (ADCC) for high dimensional data is proposed. In this method, a decision cluster classification model (DCC) consists of a set of disjoint decision clusters, each labeled with a dominant class that determines the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a variable weighting k -means algorithm. Then, the DCC model is selected from the tree. Anderson-Darling test is used to determine the stopping condition of the tree growing. A series of experiments on both synthetic and real data sets have shown that the new classification method (ADCC) performed better in accuracy and scalability than the existing methods of k -NN , decision tree and SVM. It is particularly suitable for large, high dimensional data with many classes.