An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
An iterative initial-points refinement algorithm for categorical data clustering
Pattern Recognition Letters
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster center initialization algorithm for K-means clustering
Pattern Recognition Letters
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
A genetic fuzzy k-Modes algorithm for clustering categorical data
Expert Systems with Applications: An International Journal
Iterative optimization and simplification of hierarchical clusterings
Journal of Artificial Intelligence Research
A new initialization method for clustering categorical data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
An experimental comparison of several clustering and initialization methods
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
A two-stage genetic algorithm for automatic clustering
Neurocomputing
A cluster centers initialization method for clustering categorical data
Expert Systems with Applications: An International Journal
A ranking-based algorithm for detection of outliers in categorical data
International Journal of Hybrid Intelligent Systems
Hi-index | 12.05 |
In clustering algorithms, choosing a subset of representative examples is very important in data set. Such ''exemplars'' can be found by randomly choosing an initial subset of data objects and then iteratively refining it, but this works well only if that initial choice is close to a good solution. In this paper, based on the frequency of attribute values, the average density of an object is defined. Furthermore, a novel initialization method for categorical data is proposed, in which the distance between objects and the density of the object is considered. We also apply the proposed initialization method to k-modes algorithm and fuzzy k-modes algorithm. Experimental results illustrate that the proposed initialization method is superior to random initialization method and can be applied to large data sets for its linear time complexity with respect to the number of data objects.