Prototyping hardware support for irregular applications

Authors:
Marco Ceriani;Simone Secchi;Antonino Tumeo;Oreste Villa
Affiliations:
Pacific Northwest National Laboratory, Richland, WA;Università di Cagliari, Cagliari, Italy;Pacific Northwest National Laboratory, Richland, WA;Pacific Northwest National Laboratory, Richland, WA
Venue:
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Year:
2013

Citing 19
Cited 0

Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator

IEEE Concurrency
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
ELDORADO

Proceedings of the 2nd conference on Computing frontiers
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
Multi-processor operating system emulation framework with thermal feedback for systems-on-chip

Proceedings of the 17th ACM Great Lakes symposium on VLSI
A design kit for a fully working shared memory multiprocessor on FPGA

Proceedings of the 17th ACM Great Lakes symposium on VLSI
ATLAS: a chip-multiprocessor with transactional memory support

Proceedings of the conference on Design, automation and test in Europe
FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A dual-priority real-time multiprocessor system on FPGA for automotive applications

Proceedings of the conference on Design, automation and test in Europe
Prototyping pipelined applications on a heterogeneous FPGA multiprocessor virtual platform

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
How to simulate 1000 cores

ACM SIGARCH Computer Architecture News
A case for FAME: FPGA architecture model execution

Proceedings of the 37th annual international symposium on Computer architecture
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
An FPGA-Based Framework for Technology-Aware Prototyping of Multicore Embedded Architectures

IEEE Embedded Systems Letters
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of FPGA platforms developed with off-the-shelf soft cores has recently emerged as one of the most promising fast prototyping approaches to design, evaluate and validate new architectural components for multi- and many-core processors. The approach appears to provide valuable benefits: optimizations to complex designs can be evaluated directly in hardware, at speeds hundreds of times faster than simulation, with efforts apparently limited only to the development of the new components. However, current FPGA toolchains that allow quick deployment of system-on-chip designs still have troubles when implementing multiprocessor designs. Often, a significant effort is also required to address the limitations of these toolchains. In this paper we discuss the design of a multi-node FPGA prototype, developed with the Xilinx toolchain, for exploring components to optimize multi- and many-core processors for the execution of irregular applications. Irregular applications, such as data-mining and social network analysis, employ large, pointer-based data structures (graphs, unbalanced trees, unstructured grids) that present poor locality and are very difficult to partition. Commodity clusters, which integrate powerful multi-core cache-based processors, are optimized for locality and employ distributed memory programming models. Developing irregular applications on them is complex, and often it does not provide performance scaling. We designed a set of hardware/software components that can potentially enhance commodity processors for efficiently executing irregular applications on multi-node systems, and we have integrated and validated them by exploiting FPGA rapid prototyping. We present the components and the prototype, highlighting the benefits and challenges in using such approach for architectural studies. We present an initial study on the tradeoffs of the platform, showing how prototyping can be effective, but also underlining the aspects that still need to be improved in the toolchain to allow better and deeper analysis.