A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

  • Authors:
  • Noha A. Yousri;Mohamed S. Kamel;Mohamed A. Ismail

  • Affiliations:
  • Computers and System Engineering, University of Alexandria, Egypt and Electrical and Computer Engineering, University of Waterloo, Ontario, Canada;Electrical and Computer Engineering, University of Waterloo, Ontario, Canada;Computers and System Engineering, University of Alexandria, Egypt

  • Venue:
  • Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm ''Mitosis'' is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.