A non-parametric method for data clustering with optimal variable weighting

Authors:
Ji-Won Chung;In-Chan Choi
Affiliations:
Department of Industrial Systems and Information Engineering, Korea University, Anamdong, Seongbookku, Seoul, Republic of Korea;Department of Industrial Systems and Information Engineering, Korea University, Anamdong, Seongbookku, Seoul, Republic of Korea
Venue:
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Year:
2006

Citing 6
Cited 0

Using Discriminant Eigenfeatures for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Information-theoretic algorithm for feature selection

Pattern Recognition Letters
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since cluster analysis in data mining often deals with large-scale high-dimensional data with masking variables, it is important to remove non-contributing variables for accurate cluster recovery and also for proper interpretation of clustering results. Although the weights obtained by variable weighting methods can be used for the purpose of variable selection (or, elimination), they alone hardly provide a clear guide on selecting variables for subsequent analysis. In addition, variable selection and variable weighting are highly interrelated with the choice on the number of clusters. In this paper, we propose a non-parametric data clustering method, based on the W-k-means type clustering, for an automated and joint decision on selecting variables, determining variable weights, and deciding the number of clusters. Conclusions are drawn from computational experiments with random data and real-life data.