Fast shared on-chip memory architecture for efficient hybrid computing with CGRAs

Authors:
Jongeun Lee;Yeonghun Jeong;Sungsok Seo
Affiliations:
Ulsan National Institute of Science and Technology, Ulsan, Korea;Ulsan National Institute of Science and Technology, Ulsan, Korea;Ulsan National Institute of Science and Technology, Ulsan, Korea
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2013

Citing 12
Cited 0

MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
The microarchitecture of FPGA-based soft processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

IEEE Micro
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SD-VBS: The San Diego Vision Benchmark Suite

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
VEGAS: soft vector processor with scratchpad memory

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

While Coarse-Grained Reconfigurable Architectures (CGRAs) are very efficient at handling regular, compute-intensive loops, their weakness at control-intensive processing and the need for frequent reconfiguration require another processor, for which usually a main processor is used. To minimize the overhead arising in such collaborative execution, we integrate a dedicated sequential processor (SP) with a reconfigurable array (RA), where the crucial problem is how to share the memory between SP and RA while keeping the SP's memory access latency very short. We present a detailed architecture, control, and program example of our approach, focusing on our optimized on-chip shared memory organization between SP and RA. Our preliminary results demonstrate that our optimized memory architecture is very effective in reducing kernel execution times (23.5% compared to a more straightforward alternative), and our approach can reduce the RA control overhead and other sequential code execution time in kernels significantly, resulting in up to 23.1% reduction in kernel execution time, compared to the conventional system using the main processor for sequential code execution.