A supervised clustering and classification algorithm for mining data with mixed variables

Authors:
Xiangyang Li;Nong Ye
Affiliations:
Dept. of Ind. & Manuf. Syst. Eng., Univ. of Michigan, Dearborn, MI, USA;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2006

Citing 0
Cited 8

A simultaneous learning framework for clustering and classification

Pattern Recognition
An incremental affinity propagation algorithm and its applications for text clustering

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Dynamic clustering of interval-valued data based on adaptive quadratic distances

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A multiobjective simultaneous learning framework for clustering and classification

IEEE Transactions on Neural Networks
A classification algorithm based on local cluster centers with a few labeled training examples

Knowledge-Based Systems
A differentiated one-class classification method with applications to intrusion detection

Expert Systems with Applications: An International Journal
Simultaneous clustering and classification over cluster structure representation

Pattern Recognition
Classification of textual E-mail spam using data mining techniques

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a data mining algorithm based on supervised clustering to learn data patterns and use these patterns for data classification. This algorithm enables a scalable incremental learning of patterns from data with both numeric and nominal variables. Two different methods of combining numeric and nominal variables in calculating the distance between clusters are investigated. In one method, separate distance measures are calculated for numeric and nominal variables, respectively, and are then combined into an overall distance measure. In another method, nominal variables are converted into numeric variables, and then a distance measure is calculated using all variables. We analyze the computational complexity, and thus, the scalability, of the algorithm, and test its performance on a number of data sets from various application domains. The prediction accuracy and reliability of the algorithm are analyzed, tested, and compared with those of several other data mining algorithms.