Algorithms for clustering data
Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods
Future Generation Computer Systems - Special double issue on data mining
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering and its validation in a symbolic framework
Pattern Recognition Letters
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
HARP: A Practical Projected Clustering Algorithm
IEEE Transactions on Knowledge and Data Engineering
Subspace clustering for high dimensional categorical data
ACM SIGKDD Explorations Newsletter
Automated Variable Weighting in k-Means Type Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clicks: An effective algorithm for mining subspace clusters in categorical datasets
Data & Knowledge Engineering
Locally adaptive metrics for clustering high dimensional data
Data Mining and Knowledge Discovery
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
Computers and Operations Research
A convergence theorem for the fuzzy subspace clustering (FSC) algorithm
Pattern Recognition
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences: an International Journal
A new initialization method for categorical data clustering
Expert Systems with Applications: An International Journal
“Best K”: critical clustering structures in categorical datasets
Knowledge and Information Systems
Positive approximation: An accelerator for attribute reduction in rough set theory
Artificial Intelligence
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
A bi-clustering framework for categorical data
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A fuzzy subspace algorithm for clustering high dimensional data
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
Partitive clustering (K-means family)
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
Central clustering of categorical data with automated feature weighting
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.01 |
Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The convergence of the algorithm under an optimization framework is proved. The performance and scalability of the algorithm is evaluated experimentally on both synthetic and real data sets. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters.