Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA

Authors:
N. S. L. Phani Kumar;Sanjiv Satoor;Ian Buck
Affiliations:
-;-;-
Venue:
HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Year:
2009

Citing 0
Cited 6

CUDA-level performance with python-level productivity for Gaussian mixture model applications

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Language identification using multi-core processors

Computer Speech and Language
Detecting multiple stochastic network motifs in network data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Unexpected challenges in large scale machine learning

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Massively parallel expectation maximization using graphics processing units

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A fast convergence clustering algorithm merging MCMC and EM methods

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Expectation Maximization (EM) algorithm is an iterative technique widely used in the fields of signal processing and data mining. We present a parallel implementation of EM for finding maximum likelihood estimates of parameters of Gaussian mixture models, designed for many-core architecture of Graphics Processing Units (GPU). The algorithm is implemented on NVIDIA's GPUs using CUDA, following the single instruction multiple threads model. In this paper, the emphasis is laid on exploiting the data parallelism with CUDA, thus accelerating the computations. CUDA implementation of EM is designed in such a way that the speed of computation of the algorithm scales up with the number of GPU cores. Experimental results confirm the scalability across cores. The results also show that CUDA implementation of EM when applied to an input of 230K for a 32-order mixture of 32-dimensional Gaussian model takes 264 msec on Quadro FX 5800 (NVIDIA 200 series) with 240 cores to complete one iteration which is about 164 times faster when compared to a naive single threaded C implementation on CPU.