Approximate clustering in very large relational data: Research Articles

  • Authors:
  • James C. Bezdek;Richard J. Hathaway;Jacalyn M. Huband;Christopher Leckie;Ramamohanarao Kotagiri

  • Affiliations:
  • Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA;Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460, USA;Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, 3010, Australia;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, 3010, Australia

  • Venue:
  • International Journal of Intelligent Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Different extensions of fuzzy c-means (FCM) clustering have been developed to approximate FCM clustering in very large (unloadable) image (eFFCM) and object vector (geFFCM) data. Both extensions share three phases: (1) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (2) clustering with (literal or exact) FCM; and (3) noniterative extension of the literal clusters to the remainder of the data set. This article presents a comparable method for the remaining case of interest, namely, clustering in VL relational data. We will propose and discuss each of the four phases of eNERF and our algorithm for this last case: (1) finding distinguished features that monitor progressive sampling, (2) progressively sampling a square N × N relation matrix RN until an n × n sample relation Rn passes a statistical test, (3) clustering Rn with literal non-Euclidean relational fuzzy c-means, and (4) extending the clusters in Rn to the remainder of the relational data. The extension phase in this third case is not as straightforward as it was in the image and object data cases, but our numerical examples suggest that eNERF has the same approximation qualities that eFFCM and geFFCM do. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 817–841, 2006.