Algorithms for clustering data
Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm
Pattern Recognition Letters
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Applications of Data Mining in Computer Security
Applications of Data Mining in Computer Security
An iterative initial-points refinement algorithm for categorical data clustering
Pattern Recognition Letters
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data
IEEE Transactions on Knowledge and Data Engineering
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On efficiently summarizing categorical databases
Knowledge and Information Systems
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A genetic algorithm that exchanges neighboring centers for k-means clustering
Pattern Recognition Letters
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters
IEEE Transactions on Knowledge and Data Engineering
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences: an International Journal
A new initialization method for categorical data clustering
Expert Systems with Applications: An International Journal
An initialization method for the K-Means algorithm using neighborhood model
Computers & Mathematics with Applications
“Best K”: critical clustering structures in categorical datasets
Knowledge and Information Systems
A new initialization method for clustering categorical data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Positive approximation: An accelerator for attribute reduction in rough set theory
Artificial Intelligence
Approximation reduction in inconsistent incomplete decision tables
Knowledge-Based Systems
Expert Systems with Applications: An International Journal
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
A hybrid particle swarm optimization approach for clustering and classification of datasets
Knowledge-Based Systems
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Knowledge-Based Systems
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
A modification of the k-means method for quasi-unsupervised learning
Knowledge-Based Systems
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
An improved genetic clustering algorithm for categorical data
PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points.