Symbolic clustering using a new dissimilarity measure
Pattern Recognition
The formation and use of abstract concepts in design
Concept formation knowledge and experience in unsupervised learning
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
A robust and scalable clustering algorithm for mixed type attributes in large database environment
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A discrete-valued clustering algorithm with applications to biomolecular data
Information Sciences: an International Journal
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Squeezer: an efficient algorithm for clustering categorical data
Journal of Computer Science and Technology
Unsupervised Learning with Mixed Numeric and Nominal Data
IEEE Transactions on Knowledge and Data Engineering
Experiments with Incremental Concept Formation: UNIMEM
Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Maximal consistent block technique for rule acquisition in incomplete information systems
Information Sciences: an International Journal
Fuzzy clustering of categorical data using fuzzy centroids
Pattern Recognition Letters
Rough Set-Based Clustering with Refinement Using Shannon's Entropy Theory
Computers & Mathematics with Applications
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
MMR: An algorithm for clustering categorical data using Rough Set Theory
Data & Knowledge Engineering
A rough set approach for selecting clustering attribute
Knowledge-Based Systems
Approximation reduction in inconsistent incomplete decision tables
Knowledge-Based Systems
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
Finding key attribute subset in dataset for outlier detection
Knowledge-Based Systems
DECA: A Discrete-Valued Data Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy
IEEE Transactions on Pattern Analysis and Machine Intelligence
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
Generalizing self-organizing map for categorical data
IEEE Transactions on Neural Networks
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Knowledge-Based Systems
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
Rough set based fuzzy k-modes for categorical data
SEMCCO'12 Proceedings of the Third international conference on Swarm, Evolutionary, and Memetic Computing
Knowledge acquisition based on learning of maximal structure fuzzy rules
Knowledge-Based Systems
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval
Knowledge-Based Systems
Hi-index | 0.00 |
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. In this paper, the limitations of the simple matching dissimilarity measure and Ng's dissimilarity measure are analyzed using some illustrative examples. Based on the idea of biological and genetic taxonomy and rough membership function, a new dissimilarity measure for the k-Modes algorithm is defined. A distinct characteristic of the new dissimilarity measure is to take account of the distribution of attribute values on the whole universe. A convergence study and time complexity of the k-Modes algorithm based on new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure, especially on data sets with biological and genetic taxonomy information.