Computer simulation using particles
Computer simulation using particles
Journal of Algorithms
Comments on 'Parallel Algorithms for Hierarchical Clustering and Cluster Validity'
IEEE Transactions on Pattern Analysis and Machine Intelligence
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Plasma Physics Via Computer
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Flexible Grid-Based Clustering
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A search space reduction methodology for data mining in large databases
Engineering Applications of Artificial Intelligence
Hybrid Algorithm to Data Clustering
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
A search space reduction methodology for large databases: a case study
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
A distributed hebb neural network for network anomaly detection
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
An automated search space reduction methodology for large databases
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 0.00 |
A fast and accurate unsupervised clustering algorithm has been developed for clustering very large datasets. Though designed for very large volumes of geospatial data, the algorithm is general enough to be used in a wide variety of domain applications. The number of computations the algorithm requires is ~ O(N), and thus faster than hierarchical algorithms. Unlike the popular K-means heuristic, this algorithm does not require a series of iterations to converge to a solution. In addition, this method does not depend on initialization of a given number of cluster representatives, and so is insensitive to initial conditions. Being unsupervised, the algorithm can also "rank" each cluster based on density. The method relies on weighting a dataset to grid points on a mesh, and using a small number of rule-based agents to find the high density clusters. This method effectively reduces large datasets to the size of the grid, which is usually many orders of magnitude smaller. Numerical experiments are shown that demonstrate the advantages of this algorithm over other techniques.