Approximate clustering in very large relational data: Research Articles

Authors:
James C. Bezdek;Richard J. Hathaway;Jacalyn M. Huband;Christopher Leckie;Ramamohanarao Kotagiri
Affiliations:
Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA;Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460, USA;Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, 3010, Australia;Department of Computer Science and Software Engineering, University of Melbourne, Victoria, 3010, Australia
Venue:
International Journal of Intelligent Systems
Year:
2006

Citing 0
Cited 14

Approximate data mining in very large relational data

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Scalable visual assessment of cluster tendency for large data sets

Pattern Recognition
Approximate Spectral Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Density-weighted fuzzy c-means clustering

IEEE Transactions on Fuzzy Systems
eCCV: a new fuzzy cluster validity measure for large relational bioinformatics datasets

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Fuzzy clustering with weighted medoids for relational data

Pattern Recognition
Median fuzzy c-means for clustering dissimilarity data

Neurocomputing
Approximate pairwise clustering for large data sets via sampling plus extension

Pattern Recognition
Feature selection for unlabeled data

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part II
Maximin initialization for cluster analysis

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Vector quantization based approximate spectral clustering of large datasets

Pattern Recognition
A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints

Journal of Information Science
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval

Knowledge-Based Systems
Online fuzzy medoid based clustering algorithms

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Different extensions of fuzzy c-means (FCM) clustering have been developed to approximate FCM clustering in very large (unloadable) image (eFFCM) and object vector (geFFCM) data. Both extensions share three phases: (1) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (2) clustering with (literal or exact) FCM; and (3) noniterative extension of the literal clusters to the remainder of the data set. This article presents a comparable method for the remaining case of interest, namely, clustering in VL relational data. We will propose and discuss each of the four phases of eNERF and our algorithm for this last case: (1) finding distinguished features that monitor progressive sampling, (2) progressively sampling a square N × N relation matrix RN until an n × n sample relation Rn passes a statistical test, (3) clustering Rn with literal non-Euclidean relational fuzzy c-means, and (4) extending the clusters in Rn to the remainder of the relational data. The extension phase in this third case is not as straightforward as it was in the image and object data cases, but our numerical examples suggest that eNERF has the same approximation qualities that eFFCM and geFFCM do. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 817–841, 2006.