GPUMAFIA: efficient subspace clustering with MAFIA on GPUs

Authors:
Andrew Adinetz;Jiri Kraus;Jan Meinke;Dirk Pleiter
Affiliations:
JSC, Forschungszentrum Jülich, Jülich, Germany,Research Computing Center, Lomonosov Moscow State University, Russia;NVIDIA GmbH, Germany;JSC, Forschungszentrum Jülich, Jülich, Germany;JSC, Forschungszentrum Jülich, Jülich, Germany
Venue:
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Year:
2013

Citing 15
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dynamic Programming

Dynamic Programming
A Fast Algorithm for Subspace Clustering by Pattern Similarity

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering billions of data points using GPUs

Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
K-Means on Commodity GPUs with CUDA

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03
Density-based clustering using graphics processors

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient mining of distance-based subspace clusters

Statistical Analysis and Data Mining - Best of SDM'09
Evaluating clustering in subspace projections of high dimensional data

Proceedings of the VLDB Endowment
Detection and visualization of subspace cluster hierarchies

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
GPU-Based Multilevel Clustering

IEEE Transactions on Visualization and Computer Graphics
CAMPAIGN

Bioinformatics
Scalable clustering using graphics processors

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering, i.e., the identification of regions of similar objects in a multi-dimensional data set, is a standard method of data analytics with a large variety of applications. For high-dimensional data, subspace clustering can be used to find clusters among a certain subset of data point dimensions and alleviate the curse of dimensionality. In this paper we focus on the MAFIA subspace clustering algorithm and on using GPUs to accelerate the algorithm. We first present a number of algorithmic changes and estimate their effect on computational complexity of the algorithm. These changes improve the computational complexity of the algorithm and accelerate the sequential version by 1---2 orders of magnitude on practical datasets while providing exactly the same output. We then present the GPU version of the algorithm, which for typical datasets provides a further 1---2 orders of magnitude speedup over a single CPU core or about an order of magnitude over a typical multi-core CPU. We believe that our faster implementation widens the applicability of MAFIA and subspace clustering.