C-DAC's efforts: application kernels on HPC cluster with GPU accelerators

  • Authors:
  • Vcv. Rao;Nisha Agrawal;Samrit Maity

  • Affiliations:
  • C-DAC, Pune University Campus, Pune, Maharashtra, India;C-DAC, Pune University Campus, Pune, Maharashtra, India;C-DAC, Pune University Campus, Pune, Maharashtra, India

  • Venue:
  • Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and hexahedral meshes are generated and graph partitioning METIS [14] software is used to generate load-balanced sub-domains. The iterative methods have been used to solve result algebraic matrix system of linear equations. A combination of MPI with CUDA and OpenCL enabled NVIDIA as well as OpenCL based AMD-ATI GPUs of HPC GPU Cluster have been used in our experiments [4,6,7,8]. Our experiments indicate that the MPI-CUDA codes based on FDM and FEM achieves nearly 6x speed-ups for large mesh sizes in comparison to host-cpu implementation of the same code. The un-optimized OpenCL implementation GPU times have shown marginal improvement in speed-ups whereas counterpart the CUDA codes achieved maximum speedup of 4x to 6x on HPC GPU Cluster. We presented performance analysis for different mesh sizes that prove performance capabilities of performance and scalability of FDM and FEM computations GPU cluster.