Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes

Authors:
Yan Li;Edward Hung
Affiliations:
Department of Computing, The Hong Kong Polytechnic University Hung Hom, Hong Kong;Department of Computing, The Hong Kong Polytechnic University Hung Hom, Hong Kong
Venue:
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Year:
2009

Citing 8
Cited 1

Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Interactive Approach to Building Classification Models by Clustering and Cluster Validation

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
A Visual Method of Cluster Validation with Fastmap

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
What are the grand challenges for data mining?: KDD-2006 panel report

ACM SIGKDD Explorations Newsletter
Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

Effects of data set features on the performances of classification algorithms

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a decision cluster forest classification model is proposed for high dimensional data with multiple classes. A decision cluster forest (DCF) consists of a set of decision cluster trees, in which the leaves of each tree are clusters labeled with the same class that determines the class of new objects falling in the clusters. By recursively calling a variable weighting k -means algorithm, a decision cluster tree can be generated from a subset of the training data that contains the objects in the same class. The set of m decision cluster trees grown from the subsets of m classes constitute the decision cluster forest. Anderson-Darling test is used to determine the stopping condition of tree growing. A DCF classification (DCFC) model is selected from all leaves of the m decision cluster trees in the forest. A series of experiments on both synthetic and real data sets have shown that the DCFC model performed better in accuracy and scalability than the single decision cluster tree method and the methods of k -NN , decision tree and SVM. This new model is particularly suitable for large, high dimensional data with many classes.