Algorithms for clustering data
Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Applications of Data Mining in Computer Security
Applications of Data Mining in Computer Security
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data
IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Automated Variable Weighting in k-Means Type Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
On Data Labeling for Clustering Categorical Data
IEEE Transactions on Knowledge and Data Engineering
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences: an International Journal
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Positive approximation: An accelerator for attribute reduction in rough set theory
Artificial Intelligence
Improving k-modes algorithm considering frequencies of attribute values in mode
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
Hi-index | 0.01 |
The k-modes algorithm and its modified versions are widely used to cluster categorical data. However, in the iterative process of these algorithms, the updating formulae, such as the partition matrix, cluster centers and attribute weights, are computed based on within-cluster information only. The between-cluster information is not considered, which maybe result in the clustering results with weak separation among different clusters. Therefore, in this paper, we propose a new term which is used to reflect the separation. Furthermore, the new optimization objective functions are developed by adding the proposed term to the objective functions of several existing k-modes algorithms. Under the optimization framework, the corresponding updating formulae and convergence of the iterative process is strictly derived. The above improvements are used to enhance the effectiveness of these existing k-modes algorithms whilst keeping them simple. The experimental studies on real data sets from the UCI (University of California Irvine) Machine Learning Repository illustrate that these improved algorithms outperform their original counterparts in clustering categorical data sets and are also scalable to large data sets for their linear time complexity with respect to either the number of data objects, attributes or clusters.