Categorical data fuzzy clustering: An analysis of local search heuristics

  • Authors:
  • Stefano Benati

  • Affiliations:
  • Dipartimento di Informatica e Studi Aziendali, University of Trento, Via Inama 5, 38100 Trento, Italy

  • Venue:
  • Computers and Operations Research
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c centroids that are the most representative of the whole population. Moreover, a set of weights wij must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition. In this contribution, four methods are selected, that are implementations of meta-heuristics tested to solve p-median problems. Here, they are implemented and tested to find the optimal fuzzy c-partition. All heuristics implement neighborhood search with different strategies of visiting neighboring solutions: they are random restart method (RR), that is used in many commercial softwares and suggested in textbooks, tabu search (TS) that tries to find the best move to escape from a local optimum, variable neighborhood search (VNS), that explores quickly the solution space, candidate list search (CLS), that explores only interesting starting solutions. It is found that there is not a clear best method, but their performance depends on some parameter. TS is usually accurate, but time consuming. When c is small, VNS can be a reliable alternative, while, when c is large and there are many data to cluster, CLS provides good results. We point out that the simple RR method, that is sometimes used in commercial codes is of very poor quality: the implementation of one of the neighbor search algorithms leads to substantial improvements.