Clustering large data with uncertainty

  • Authors:
  • Sampreeti Ghosh;Sushmita Mitra

  • Affiliations:
  • Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, India;Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, India

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new algorithm is designed for handling fuzziness while mining large data. A new novel cost function weighted by fuzzy membership, is proposed in the framework of CLARANS. A new scalable approximation to the maximum number of neighbors, explored at each node, is developed; thus reducing the computational time for large data while eliminating the need for user-defined (heuristic) parameters in the existing equation. The goodness of the generated clusters is evaluated in terms of Xie-Beni validity index. Results demonstrate the superiority of the proposed algorithm, over both synthetic and real data sets, in terms of goodness of clustering. It is interesting to note that our algorithm always converges to the globally best values at the optimal number of partitions. Moreover compared to existing fuzzy algorithms, FCLARANS without scanning the whole dataset, searching small number of neighbors, is able to handle the uncertainty due to overlapping nature of the various partitions. This is the main motivation of fuzzification of the algorithm CLARANS.