Prototyping hardware support for irregular applications

  • Authors:
  • Marco Ceriani;Simone Secchi;Antonino Tumeo;Oreste Villa

  • Affiliations:
  • Pacific Northwest National Laboratory, Richland, WA;Università di Cagliari, Cagliari, Italy;Pacific Northwest National Laboratory, Richland, WA;Pacific Northwest National Laboratory, Richland, WA

  • Venue:
  • Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of FPGA platforms developed with off-the-shelf soft cores has recently emerged as one of the most promising fast prototyping approaches to design, evaluate and validate new architectural components for multi- and many-core processors. The approach appears to provide valuable benefits: optimizations to complex designs can be evaluated directly in hardware, at speeds hundreds of times faster than simulation, with efforts apparently limited only to the development of the new components. However, current FPGA toolchains that allow quick deployment of system-on-chip designs still have troubles when implementing multiprocessor designs. Often, a significant effort is also required to address the limitations of these toolchains. In this paper we discuss the design of a multi-node FPGA prototype, developed with the Xilinx toolchain, for exploring components to optimize multi- and many-core processors for the execution of irregular applications. Irregular applications, such as data-mining and social network analysis, employ large, pointer-based data structures (graphs, unbalanced trees, unstructured grids) that present poor locality and are very difficult to partition. Commodity clusters, which integrate powerful multi-core cache-based processors, are optimized for locality and employ distributed memory programming models. Developing irregular applications on them is complex, and often it does not provide performance scaling. We designed a set of hardware/software components that can potentially enhance commodity processors for efficiently executing irregular applications on multi-node systems, and we have integrated and validated them by exploiting FPGA rapid prototyping. We present the components and the prototype, highlighting the benefits and challenges in using such approach for architectural studies. We present an initial study on the tradeoffs of the platform, showing how prototyping can be effective, but also underlining the aspects that still need to be improved in the toolchain to allow better and deeper analysis.