Semi-supervised parameter-free divisive hierarchical clustering of categorical data

  • Authors:
  • Tengke Xiong;Shengrui Wang;André Mayers;Ernest Monga

  • Affiliations:
  • Department of Computer Science, University of Sherbrooke;Department of Computer Science, University of Sherbrooke;Department of Computer Science, University of Sherbrooke;Department of Mathematics, University of Sherbrooke, Sherbrooke, QC, Canada

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised clustering can yield considerable improvement over unsupervised clustering. Most existing semi-supervised clustering algorithms are non-hierarchical, derived from the k-means algorithm and designed for analyzing numeric data. Clustering categorical data is a challenging issue due to the lack of inherently meaningful similarity measure, and semi-supervised clustering in the categorical domain remains untouched. In this paper, we propose a novel semi-supervised divisive hierarchical algorithm for categorical data. Our algorithm is parameter-free, fully automatic and effective in taking advantage of instance-level constraint background knowledge to improve the quality of the resultant dendrogram. Experiments on real-life data demonstrate the promising performance of our algorithm.