A memory interface for multi-purpose multi-stream accelerators

Authors:
Sylvain Girbal;Olivier Temam;Sami Yehia;Hugues Berry;Zheng LI
Affiliations:
Thales Research and Technology, Palaiseau, France;INRIA Saclay, Orsay, France;Thales Research and Technology, Palaiseau, France;INRIA Saclay, Orsay, France;INRIA Saclay, Orsay, France
Venue:
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2010

Citing 14
Cited 1

An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Exploring the VLSI Scalability of Stream Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Scalable High-Performance DMA Architecture for DSP Applications

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development

IEEE Computer Architecture Letters
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Sorting networks and their applications

AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Introduction of Architecturally Visible Storage in Instruction Set Extensions

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Selective flexibility: breaking the rigidity of datapath merging

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Power and programming challenges make heterogeneous multi-cores composed of cores and ASICs an attractive alternative to homogeneous multi-cores. Recently, multi-purpose loop-based generated accelerators have emerged as an especially attractive accelerator option. They have several assets: short design time (automatic generation), flexibility (multi-purpose) but low configuration and routing overhead (unlike FPGAs), computational performance (operations are directly mapped to hardware), and a focus on memory throughput by leveraging loop constructs. However, with multiple streams, the memory behavior of such accelerators can become at least as complex as that of superscalar processors, while they still need to retain the memory ordering predictability and throughput efficiency of DMAs. In this article, we show how to design a memory interface for multi-purpose accelerators which combines the ordering predictability of DMAs, retains key efficiency features of memory systems for complex processors, and requires only a fraction of their cost by leveraging the properties of streams references. We evaluate the approach with a synthesizable version of the memory interface for an example 9-task generated loop-based accelerator