Algorithms for clustering data
Algorithms for clustering data
Vector quantization and signal compression
Vector quantization and signal compression
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Clustering through decision tree construction
Proceedings of the ninth international conference on Information and knowledge management
An experimental comparison of model-based clustering methods
Machine Learning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster validity methods: part I
ACM SIGMOD Record
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Techniques of Cluster Algorithms in Data Mining
Data Mining and Knowledge Discovery
Model selection for probabilistic clustering using cross-validatedlikelihood
Statistics and Computing
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Top-Down Induction of Clustering Trees
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Clustering categorical data: an approach based on dynamical systems
The VLDB Journal — The International Journal on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering
Data Mining and Knowledge Discovery
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree
IEEE Transactions on Knowledge and Data Engineering
Subspace clustering for high dimensional categorical data
ACM SIGKDD Explorations Newsletter
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Discovering Knowledge-Sharing Communities in Question-Answering Forums
ACM Transactions on Knowledge Discovery from Data (TKDD)
Semi-supervised parameter-free divisive hierarchical clustering of categorical data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
Clustering of heterogeneously typed data with soft computing - a case study
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
A self-organizing map for transactional data and the related categorical domain
Applied Soft Computing
Detecting and Tracking Topics and Events from Web Search Logs
ACM Transactions on Information Systems (TOIS)
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
Central clustering of categorical data with automated feature weighting
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first phase, cluster assignments are given, and a new cluster is added to the partition by choosing and splitting a low-quality cluster. In the second phase, the number of clusters is fixed, and an attempt to optimize cluster assignments is done. On the basis of such features, the algorithm attempts to improve the overall quality of the whole partition and finds clusters in the data, whose number is naturally established on the basis of the inherent features of the underlying dataset, rather than being previously specified. Furthermore, the approach is parametric to the notion of cluster quality: here, a cluster is defined as a set of tuples exhibiting a sort of homogeneity. We show how a suitable notion of cluster homogeneity can be defined in the context of high dimensional categorical data, from which an effective instance of the proposed clustering scheme immediately follows. Experiments on both synthetic and real data prove that the devised algorithm scales linearly and achieves nearly-optimal results in terms of compactness and separation.