Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dynamic Programming
A Fast Algorithm for Subspace Clustering by Pattern Similarity
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering billions of data points using GPUs
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
K-Means on Commodity GPUs with CUDA
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03
Density-based clustering using graphics processors
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient mining of distance-based subspace clusters
Statistical Analysis and Data Mining - Best of SDM'09
Evaluating clustering in subspace projections of high dimensional data
Proceedings of the VLDB Endowment
Detection and visualization of subspace cluster hierarchies
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
GPU-Based Multilevel Clustering
IEEE Transactions on Visualization and Computer Graphics
Bioinformatics
Scalable clustering using graphics processors
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units
IEEE Transactions on Fuzzy Systems
Hi-index | 0.00 |
Clustering, i.e., the identification of regions of similar objects in a multi-dimensional data set, is a standard method of data analytics with a large variety of applications. For high-dimensional data, subspace clustering can be used to find clusters among a certain subset of data point dimensions and alleviate the curse of dimensionality. In this paper we focus on the MAFIA subspace clustering algorithm and on using GPUs to accelerate the algorithm. We first present a number of algorithmic changes and estimate their effect on computational complexity of the algorithm. These changes improve the computational complexity of the algorithm and accelerate the sequential version by 1---2 orders of magnitude on practical datasets while providing exactly the same output. We then present the GPU version of the algorithm, which for typical datasets provides a further 1---2 orders of magnitude speedup over a single CPU core or about an order of magnitude over a typical multi-core CPU. We believe that our faster implementation widens the applicability of MAFIA and subspace clustering.