Hierarchical density-based clustering of categorical data and a simplification

  • Authors:
  • Bill Andreopoulos;Aijun An;Xiaogang Wang

  • Affiliations:
  • York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada;York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada;York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A challenge involved in applying density-based clustering to categorical datasets is that the 'cube' of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data. HIERDENC offers a basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed. The characteristics of HIERDENC include: (i) it builds a hierarchy representing the underlying cluster structure of the categorical dataset, (ii) it minimizes the user-specified input parameters, (iii) it is insensitive to the order of object input, (iv) it can handle outliers. We evaluate HIERDENC on small-dimensional standard categorical datasets, on which it produces more accurate results than other algorithms. We present a faster simplification of HIERDENC called the MULIC algorithm. MULIC performs better than subspace clustering algorithms in terms of finding the multi-layered structure of special datasets.