A general stochastic clustering method for automatic cluster discovery

Authors:
Swee Chuan Tan;Kai Ming Ting;Shyh Wei Teng
Affiliations:
SIM University, 461 Clementi Road, Singapore;Monash University, Gippsland School of Information Technology, Australia;Monash University, Gippsland School of Information Technology, Australia
Venue:
Pattern Recognition
Year:
2011

Citing 28
Cited 1

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Models of incremental concept formation

Artificial Intelligence
The dynamics of collective sorting robot-like ants and ant-like robots

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Diversity and adaptation in populations of clustering ants

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Swarm intelligence: from natural to artificial systems

Swarm intelligence: from natural to artificial systems
Data clustering: a review

ACM Computing Surveys (CSUR)
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval

Information Retrieval
A Stochastic Heuristic for Visualising Graph Clusters in a Bi-DimensionalSpace Prior to Partitioning

Journal of Heuristics
Formation of an ant cemetery: swarm intelligence or statistical accident?

Future Generation Computer Systems - Cellular automata CA 2000 and ACRI 2000
On Improving Clustering in Numerical Databases with Artificial Ants

ECAL '99 Proceedings of the 5th European Conference on Advances in Artificial Life
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Adaptive Flocking Algorithm for Spatial Clustering

PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Cluster analysis of gene expression data

Cluster analysis of gene expression data
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Ant-Based Clustering and Topographic Mapping

Artificial Life
Indexed-based density biased sampling for clustering applications

Data & Knowledge Engineering
A flocking based algorithm for document clustering analysis

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Nature-inspired applications and systems
An aggregated clustering approach using multi-ant colonies algorithms

Pattern Recognition
A New Approach of Data Clustering Using a Flock of Agents

Evolutionary Computation
A Density-Biased Sampling Technique to Improve Cluster Representativeness

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Examining dissimilarity scaling in ant colony approaches to data clustering

ACAL'07 Proceedings of the 3rd Australian conference on Progress in artificial life
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Evolutionary Approach to Multiobjective Clustering

IEEE Transactions on Evolutionary Computation

Point set morphological filtering and semantic spatial configuration modeling: Application to microscopic image and bio-structure analysis

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run.