A taxonomy of accelerator architectures and their programming models

Authors:
C. Caşcaval;S. Chatterjee;H. Franke;K. J. Gildea;P. Pattnaik
Affiliations:
Qualcomm Research, Santa Clara, CA;IBM Systems and Technology Group, Austin, TX;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Systems and Technology Group, Poughkeepsie, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2010

Citing 14
Cited 2

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Cg: a system for programming graphics hardware in a C-like language

ACM SIGGRAPH 2003 Papers
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
TCP offload is a dumb idea whose time has come

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
GPU architecture overview

ACM SIGGRAPH 2007 courses
Amdahl's Law in the Multicore Era

Computer
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
A computing origami: folding streams in FPGAs

Proceedings of the 46th Annual Design Automation Conference
Introduction to the wire-speed processor and architecture

IBM Journal of Research and Development

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

Proceedings of the 10th FPGAworld Conference
Disengaged scheduling for fair, protected access to fast computational accelerators

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the clock frequency of silicon chips is leveling off, the computer architecture community is looking for different solutions to continue application performance scaling. One such solution is the multicore approach, i.e., using multiple simple cores that enable higher performance than wide superscalar processors, provided that the workload can exploit the parallelism. Another emerging alternative is the use of customized designs (accelerators) at different levels within the system. These are specialized functional units integrated with the core, specialized cores, attached processors, or attached appliances. The design tradeoff is quite compelling because current processor chips have billions of transistors, but they cannot all be activated or switched at the same time at high frequencies. Specialized designs provide increased power efficiency but cannot be used as general-purpose compute engines. Therefore, architects trade area for power efficiency by placing in the design additional units that are known to be active at different times. The resulting system is a heterogeneous architecture, with the potential of specialized execution that accelerates different workloads. While designing and building such hardware systems is attractive, writing and porting software to a heterogeneous platform is even more challenging than parallelism for homogeneous multicore systems. In this paper, we propose a taxonomy that allows us to define classes of accelerators, with the goal of focusing on a small set of programming models for accelerators. We discuss several types of currently popular accelerators and identify challenges to exploiting such accelerators in current software stacks. This paper serves as a guide for both hardware designers by providing them with a view on how software best exploits specialization and software programmers by focusing research efforts to address parallelism and heterogeneity.