Semi-supervised parameter-free divisive hierarchical clustering of categorical data

Authors:
Tengke Xiong;Shengrui Wang;André Mayers;Ernest Monga
Affiliations:
Department of Computer Science, University of Sherbrooke;Department of Computer Science, University of Sherbrooke;Department of Computer Science, University of Sherbrooke;Department of Mathematics, University of Sherbrooke, Sherbrooke, QC, Canada
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 16
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Concept decompositions for large sparse text data using clustering

Machine Learning
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional categorical data

ACM SIGKDD Explorations Newsletter
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Enhancing semi-supervised clustering: a feature projection perspective

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Semi-supervised graph clustering: a kernel approach

Machine Learning
Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

Data Mining and Knowledge Discovery
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
Semi-supervised Density-Based Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised clustering can yield considerable improvement over unsupervised clustering. Most existing semi-supervised clustering algorithms are non-hierarchical, derived from the k-means algorithm and designed for analyzing numeric data. Clustering categorical data is a challenging issue due to the lack of inherently meaningful similarity measure, and semi-supervised clustering in the categorical domain remains untouched. In this paper, we propose a novel semi-supervised divisive hierarchical algorithm for categorical data. Our algorithm is parameter-free, fully automatic and effective in taking advantage of instance-level constraint background knowledge to improve the quality of the resultant dendrogram. Experiments on real-life data demonstrate the promising performance of our algorithm.