ACM Computing Surveys (CSUR)
Concept decompositions for large sparse text data using clustering
Machine Learning
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional categorical data
ACM SIGKDD Explorations Newsletter
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Enhancing semi-supervised clustering: a feature projection perspective
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
Semi-supervised graph clustering: a kernel approach
Machine Learning
Data Mining and Knowledge Discovery
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
Semi-supervised Density-Based Clustering
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Hi-index | 0.00 |
Semi-supervised clustering can yield considerable improvement over unsupervised clustering. Most existing semi-supervised clustering algorithms are non-hierarchical, derived from the k-means algorithm and designed for analyzing numeric data. Clustering categorical data is a challenging issue due to the lack of inherently meaningful similarity measure, and semi-supervised clustering in the categorical domain remains untouched. In this paper, we propose a novel semi-supervised divisive hierarchical algorithm for categorical data. Our algorithm is parameter-free, fully automatic and effective in taking advantage of instance-level constraint background knowledge to improve the quality of the resultant dendrogram. Experiments on real-life data demonstrate the promising performance of our algorithm.