Categorical data fuzzy clustering: An analysis of local search heuristics

Authors:
Stefano Benati
Affiliations:
Dipartimento di Informatica e Studi Aziendali, University of Trento, Via Inama 5, 38100 Trento, Italy
Venue:
Computers and Operations Research
Year:
2008

Citing 9
Cited 2

Fuzzy set theory—and its applications (3rd ed.)

Fuzzy set theory—and its applications (3rd ed.)
Cluster analysis and mathematical programming

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Variable neighborhood search

Computers and Operations Research
Fuzzy clustering with squared Minkowski distances

Fuzzy Sets and Systems - Special issue on clustering and learning
Fuzzy clustering based on k-nearest-neighbours rule

Fuzzy Sets and Systems - Special issue on clustering and learning
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Variable Neighborhood Decomposition Search

Journal of Heuristics
Heuristic Methods for Large Centroid Clustering Problems

Journal of Heuristics
Clustering with a genetically optimized approach

IEEE Transactions on Evolutionary Computation

Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A mixed integer linear model for clustering with variable selection

Computers and Operations Research

Quantified Score

Hi-index	0.02

Visualization

Abstract

The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c centroids that are the most representative of the whole population. Moreover, a set of weights wij must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition. In this contribution, four methods are selected, that are implementations of meta-heuristics tested to solve p-median problems. Here, they are implemented and tested to find the optimal fuzzy c-partition. All heuristics implement neighborhood search with different strategies of visiting neighboring solutions: they are random restart method (RR), that is used in many commercial softwares and suggested in textbooks, tabu search (TS) that tries to find the best move to escape from a local optimum, variable neighborhood search (VNS), that explores quickly the solution space, candidate list search (CLS), that explores only interesting starting solutions. It is found that there is not a clear best method, but their performance depends on some parameter. TS is usually accurate, but time consuming. When c is small, VNS can be a reliable alternative, while, when c is large and there are many data to cluster, CLS provides good results. We point out that the simple RR method, that is sometimes used in commercial codes is of very poor quality: the implementation of one of the neighbor search algorithms leads to substantial improvements.