Reusable components for partitioning clustering algorithms

  • Authors:
  • Boris Delibašić;Kathrin Kirchner;Johannes Ruhland;Miloš Jovanović;Milan Vukićević

  • Affiliations:
  • Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia;Faculty of Economics and Business Administration, Friedrich Schiller University of Jena, Jena, Germany;Faculty of Economics and Business Administration, Friedrich Schiller University of Jena, Jena, Germany;Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia;Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia

  • Venue:
  • Artificial Intelligence Review
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.