Partitions selection strategy for set of clustering solutions

  • Authors:
  • Katti Faceli;Tiemi C. Sakata;Marcilio C. P. de Souto;André C. P. L. F. de Carvalho

  • Affiliations:
  • Universidade Federal de São Carlos - Campus Sorocaba, Rodovia João Leme dos Santos, Km 110, 18052-780 - Sorocaba, SP, Brazil;Universidade Federal de São Carlos - Campus Sorocaba, Rodovia João Leme dos Santos, Km 110, 18052-780 - Sorocaba, SP, Brazil;Universidade Federal do Rio Grande do Norte - Departamento de Informática e Matemática Aplicada, Campus Universitário, 59072-970 - Natal, RN, Brazil;Universidade de São Paulo - ICMC, Departamento de Ciências de Computação, Caixa Postal 668, 13560-970 - São Carlos, SP, Brazil

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK-Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions.