RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems

Authors:
John R. Wernsing;Greg Stitt
Affiliations:
University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 5
Cited 0

The Density Advantage of Configurable Computing

Computer
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Exploring Graphics Processor Performance for General Purpose Applications

DSD '05 Proceedings of the 8th Euromicro Conference on Digital System Design
Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance computing systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems has often led to untapped performance potential. Application designers targeting such systems currently must determine how to parallelize computation, create device-specialized implementations for each heterogeneous resource, and determine how to partition work for each resource. In this paper, we present the RACECAR heuristic to automate the optimization of applications for multi-core heterogeneous systems by automatically exploring implementation alternatives that include different algorithms, parallelization strategies, and work distributions. Experimental results show RACECAR-specialized implementations achieve speedups up to 117x and average 11x compared to a single CPU thread when parallelizing computation across multiple cores, graphics-processing units, and field-programmable gate arrays.