SyMP: an efficient clustering approach to identify clusters of arbitrary shapes in large data sets

Authors:
Hichem Frigui
Affiliations:
University of Memphis
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 13
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering techniques for large data sets—from the past to the future

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Self-Organization of Pulse-Coupled Oscillators with Application to Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Synchronization Based Algorithm for Discovering Ellipsoidal Clusters in Large Datasets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. I

IEEE Transactions on Fuzzy Systems

A new data clustering approach: Generalized cellular automata

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new clustering algorithm, called SyMP, which is based on synchronization of pulse-coupled oscillators. SyMP represents each data point by an Integrate-and-Fire oscillator and uses the relative similarity between the points to model the interaction between the oscillators. SyMP is robust to noise and outliers, determines the number of clusters in an unsupervised manner, identifies clusters of arbitrary shapes, and can handle very large data sets. The robustness of SyMP is an intrinsic property of the synchronization mechanism. To determine the optimum number of clusters, SyMP uses a dynamic resolution parameter. To identify clusters of various shapes, SyMP models each cluster by multiple Gaussian components. The number of components is automatically determined using a dynamic intra-cluster resolution parameter. Clusters with simple shapes would be modeled by few components while clusters with more complex shapes would require a larger number of components. The scalable version of SyMP uses an efficient incremental approach that requires a simple pass through the data set. The proposed clustering approach is empirically evaluated with several synthetic and real data sets, and its performance is compared with CURE.