BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new cluster validity index for the fuzzy c-mean
Pattern Recognition Letters
Uncertainly measures of rough set prediction
Artificial Intelligence
On finding the number of clusters
Pattern Recognition Letters
ACM Computing Surveys (CSUR)
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Data mining: concepts and techniques
Data mining: concepts and techniques
Clustering by Scale-Space Filtering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Theoretic Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Reinterpreting the Category Utility Function
Machine Learning
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data
IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
The "Best K" for entropy-based categorical data clustering
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Some Equivalences between Kernel Methods and Information Theoretic Methods
Journal of VLSI Signal Processing Systems
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
On fuzzy cluster validity indices
Fuzzy Sets and Systems
Measures for evaluating the decision performance of a decision table in rough set theory
Information Sciences: an International Journal
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A density-based cluster validity approach using multi-representatives
Pattern Recognition Letters
A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters
IEEE Transactions on Knowledge and Data Engineering
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters
IEEE Transactions on Knowledge and Data Engineering
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering of time series data-a survey
Pattern Recognition
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
K-centers algorithm for clustering mixed type data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Fast and robust general purpose clustering algorithms
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Positive approximation: An accelerator for attribute reduction in rough set theory
Artificial Intelligence
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
An optimization model for outlier detection in categorical data
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Multigranulation rough sets: From partition to covering
Information Sciences: an International Journal
An automatic method to determine the number of clusters using decision-theoretic rough set
International Journal of Approximate Reasoning
Hi-index | 0.02 |
In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this problem, many algorithms have been proposed for either numerical or categorical data sets. However, these algorithms are not very effective for a mixed data set containing both numerical attributes and categorical attributes. To overcome this deficiency, a generalized mechanism is presented in this paper by integrating Renyi entropy and complement entropy together. The mechanism is able to uniformly characterize within-cluster entropy and between-cluster entropy and to identify the worst cluster in a mixed data set. In order to evaluate the clustering results for mixed data, an effective cluster validity index is also defined in this paper. Furthermore, by introducing a new dissimilarity measure into the k-prototypes algorithm, we develop an algorithm to determine the number of clusters in a mixed data set. The performance of the algorithm has been studied on several synthetic and real world data sets. The comparisons with other clustering algorithms show that the proposed algorithm is more effective in detecting the optimal number of clusters and generates better clustering results.