A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

Authors:
Noha A. Yousri;Mohamed S. Kamel;Mohamed A. Ismail
Affiliations:
Computers and System Engineering, University of Alexandria, Egypt and Electrical and Computer Engineering, University of Waterloo, Ontario, Canada;Electrical and Computer Engineering, University of Waterloo, Ontario, Canada;Computers and System Engineering, University of Alexandria, Egypt
Venue:
Pattern Recognition
Year:
2009

Citing 17
Cited 3

On the color image segmentation algorithm based on the thresholding and the fuzzy C-means techniques

Pattern Recognition
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering Algorithms

Clustering Algorithms
Cluster validity methods: part I

ACM SIGMOD Record
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Phrase-based Document Similarity Based on an Index Graph Model

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Improving DBSCAN's execution time by using a pruning technique on bit vectors

Pattern Recognition Letters
A multi-objective sequential ensemble for cluster structure analysis and visualization and application to gene expression

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
A possibilistic density based clustering for discovering clusters of arbitrary shapes and densities in high dimensional data

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III

Quantified Score

Hi-index	0.01

Visualization

Abstract

It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm ''Mitosis'' is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.