Iterative statistical kernels on contemporary GPUs

  • Authors:
  • Thilina Gunarathne;Bimalee Salpitikorala;Arun Chauhan;Geoffrey Fox

  • Affiliations:
  • School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA

  • Venue:
  • International Journal of Computational Science and Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a study of OpenCL implementations of three important kernels that occur frequently in iterative statistical applications: multi-dimensional scaling MDS, PageRank and K-means clustering. We evaluated their performance on NVIDIA Tesla and Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. We explored the optimisation of these kernels by four main techniques: 1 caching invariant data in GPU memory across iterations; 2 selectively placing data in different memory levels; 3 rearranging data in memory; 4 dividing the work between the GPU and the CPU. We also implemented a novel algorithm for MDS and a novel data layout scheme for PageRank. Our optimisations resulted in performance improvements of up to 5× to 6×, compared to naïve OpenCL implementations and up to 100× improvement over single-core CPU. We believe that these categories of optimisations are also applicable to other similar kernels.