Finding Consistent Clusters in Data Partitions

  • Authors:
  • Ana L. N. Fred

  • Affiliations:
  • -

  • Venue:
  • MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm - voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyperspherically shaped.