Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+

  • Authors:
  • V. Aggarwal;A. George;K. Yalamanchili;C. Yoon;H. Lam;G. Stitt

  • Affiliations:
  • University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL

  • Venue:
  • Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reconfigurable computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, lack of integrated, system-wide, parallel-programming models and languages presents a significant design challenge for creating applications targeting scalable, reconfigurable HPC systems. In this paper, we introduce and investigate a novel programming model based on Partitioned Global Address Space (PGAS), which simplifies development of parallel applications for such systems. The new multilevel PGAS programming model captures the unique characteristics of these systems, such as the existence of multiple levels of memory hierarchy and heterogeneous computation resources. To evaluate this multilevel PGAS model, we extend and adapt the SHMEM programming language to become what we call SHMEM+, the first known SHMEM library enabling coordination between FPGAs and CPUs in a reconfigurable, heterogeneous HPC system. Our design of SHMEM+ is highly portable and provides peak communication bandwidth comparable to vendor-proprietary versions of SHMEM. In addition, applications designed with SHMEM+ yield improved developer productivity compared to current methods of multi-device RC design and achieve a high degree of portability.