Approximate data mining in very large relational data

  • Authors:
  • James C. Bezdek;Richard J. Hathaway;Christopher Leckie;Ramamohanarao Kotagiri

  • Affiliations:
  • Department of Computer Science, University of West Florida, Pensacola, FL;Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia

  • Venue:
  • ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we discuss eNERF, an extended version of non-Euclidean relational fuzzy c-means (NERFCM) for approximate clustering in very large (unloadable) relational data. The eNERF procedure consists of four parts: (i) selection of distinguished features by algorithm DF to be monitored during progressive sampling; (ii) progressively sampling a square N×N relation matrix RN by algorithm PS until an n×n sample relation Rn passes a goodness of fit test; (iii) Clustering Rn using algorithm LNERF; and (iv), extension of the LNERF results to RN-Rn by algorithm xNERF, which uses an iterative procedure based on LNERF to compute fuzzy membership values for all of the objects remaining after LNERF clustering of the accepted sample. Three of the four algorithms are new - only LNERF (called NERFCM in the original literature) precedes this article.