Algorithms for clustering data
Algorithms for clustering data
Spatial tessellations: concepts and applications of Voronoi diagrams
Spatial tessellations: concepts and applications of Voronoi diagrams
Topology representing networks
Neural Networks
A dynamic approach for clustering data
Signal Processing
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Handbook of computational geometry
Handbook of computational geometry
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to algorithms
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Two-Round Variant of EM for Gaussian Mixtures
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Geographic Data Mining and Knowledge Discovery
Geographic Data Mining and Knowledge Discovery
Clustering intrusion detection alarms to support root cause analysis
ACM Transactions on Information and System Security (TISSEC)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Kernel k-means: spectral clustering and normalized cuts
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Density-based Approach for Data Mining Tasks
Knowledge and Information Systems
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
Knowledge and Information Systems
The minimum consistent subset cover problem and its applications in data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Knowledge Discovery in Bioinformatics: Techniques, Methods, and Applications (Wiley Series in Bioinformatics)
Top 10 algorithms in data mining
Knowledge and Information Systems
Nonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction
DENCLUE 2.0: fast clustering based on kernel density estimation
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
The minimum code length for clustering using the gray code
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
On clustering large number of data streams
Intelligent Data Analysis
Hi-index | 0.00 |
Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages—the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.