Optimal clustering in the context of overlapping cluster analysis

  • Authors:
  • Wim De Mulder

  • Affiliations:
  • Systems Research Group, University of Ghent, Ghent 9052, Belgium

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

In this paper we give a general definition for the concept 'optimal clustering' which is applicable to overlapping clusterings. Overlapping clusterings are a generalization of hard clusterings and their structure is formally developed in this paper. It is generally assumed that the domain of clustering is too heuristic to develop a general, i.e. axiomatic, definition for an optimal clustering. It is shown, however, that such a definition can be given within the domain of overlapping clusterings, using the new concept of dual clustering developed in this paper. A second concept that underlies our definition of optimal clustering is the average clustering, also playing an important role in the domain of cluster ensembles. Using the general concepts discussed in this paper, it is then shown that under some conditions it is assured that the final hard clustering extracted by majority vote from a given set of clusterings, is optimal over all hard clusterings. Unlike traditional research related to validating clusterings, we do not develop a new cluster validation measure on top of the many existing ones, but rather we develop a general framework for cluster validation measures, at least within the domain of overlapping clusterings. This framework allows to develop some general theorems about clustering.