Integrating FPGAs in high-performance computing: the architecture and implementation perspective

  • Authors:
  • Nathan Woods

  • Affiliations:
  • XtremeData Inc., Schaumburg, IL

  • Venue:
  • Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today, many enterprises are evaluating and in some cases deploying heterogeneous computing platforms that include some form of hardware acceleration or co-processing. Such systems typically consist of commodity computing clusters augmented by hardware accelerators like graphics processing units (GPUs), field programmable gate arrays (FPGAs), gaming platform chips like the Cell Processor, and other hardware accelerators.In this presentation we will identify desirable architectural features of a computing platform that includes FPGA co-processing, along with practical system design issues that arise when introducing an FPGA into a modern x86 computing blade, including mechanical, power and cooling issues.The main body of the presentation will highlight the promise of these systems using a concrete example. I will describe the architecture, technical capabilities, and limitations of a commercially available FPGA coprocessor, the XD1000™ coprocessor module from XtremeData. The XD1000 integrates a Stratix™ II FPGA into a multi-Opteron™ system by replacing one of the Opteron CPUs with the FPGA co-processor. The module communicates with other Opteron CPUs on the motherboard via point-to-point HyperTransport (HT) links with 3.2 GB/sec of bandwidth each. The FPGA coprocessor interfaces directly with motherboard DDR SDRAM memory DIMMs without the need for an intervening north bridge.Important architectural considerations include whether to provide and how to interface to local on-module memory (e.g. on-module SRAM), the ramifications of coherent vs. noncoherent HT protocol support on the coprocessor module, the coprocessor programming interface for the host system, FPGA configuration, and monitoring and test support for user debug. A block diagram of a system employing the XD1000 coprocessor is shown in Figure 1.