New unsupervised clustering algorithm for large datasets

Authors:
William Peter;John Chiochetti;Clare Giardina
Affiliations:
BAE Systems Advanced Technologies, Columbia, MD;BAE Systems Advanced Technologies, Columbia, MD;BAE Systems Advanced Technologies, Columbia, MD
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 8
Cited 6

Computer simulation using particles

Computer simulation using particles
Finding tailored partitions

Journal of Algorithms
Comments on 'Parallel Algorithms for Hierarchical Clustering and Cluster Validity'

IEEE Transactions on Pattern Analysis and Machine Intelligence
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
Plasma Physics Via Computer

Plasma Physics Via Computer
Guest Editors' Introduction: Parallel and Distributed Computing for Data Mining

IEEE Concurrency
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Flexible Grid-Based Clustering

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A search space reduction methodology for data mining in large databases

Engineering Applications of Artificial Intelligence
Hybrid Algorithm to Data Clustering

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
A search space reduction methodology for large databases: a case study

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
A distributed hebb neural network for network anomaly detection

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
An automated search space reduction methodology for large databases

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fast and accurate unsupervised clustering algorithm has been developed for clustering very large datasets. Though designed for very large volumes of geospatial data, the algorithm is general enough to be used in a wide variety of domain applications. The number of computations the algorithm requires is ~ O(N), and thus faster than hierarchical algorithms. Unlike the popular K-means heuristic, this algorithm does not require a series of iterations to converge to a solution. In addition, this method does not depend on initialization of a given number of cluster representatives, and so is insensitive to initial conditions. Being unsupervised, the algorithm can also "rank" each cluster based on density. The method relies on weighting a dataset to grid points on a mesh, and using a small number of rule-based agents to find the high density clusters. This method effectively reduces large datasets to the size of the grid, which is usually many orders of magnitude smaller. Numerical experiments are shown that demonstrate the advantages of this algorithm over other techniques.