Memory access optimization in compilation for coarse-grained reconfigurable architectures

Authors:
Yongjoo Kim;Jongeun Lee;Aviral Shrivastava;Yunheung Paek
Affiliations:
Seoul National University, Seoul, Korea;Ulsan National Institute of Science and Technology, Ulsan, Korea;Arizona State University;Seoul National University, Seoul, Korea
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2011

Citing 19
Cited 2

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches

IEEE Transactions on Computers
MorphoSys: case study of a reconfigurable computing system targeting multimedia applications

Proceedings of the 37th Annual Design Automation Conference
A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
An algorithm for mapping loops onto coarse-grained reconfigurable architectures

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Area efficient layouts of binary trees in grids

Area efficient layouts of binary trees in grids
Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Alleviating the Data Memory Bandwidth Bottleneck in Coarse-Grained Reconfigurable Arrays

ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
SPKM: a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Effective loop partitioning and scheduling under memory and register dual constraints

Proceedings of the conference on Design, automation and test in Europe
A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

IEEE Micro
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Operation and data mapping for CGRAs with multi-bank memory

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Memory-Aware application mapping on coarse-grained reconfigurable arrays

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Architecture for transparent binary acceleration of loops with memory accesses

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Coarse-grained reconfigurable architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the form of data mapping. For reasons of power-efficiency and complexity, CGRAs use multibank local memory, and a row of PEs share memory access. In order for each row of the PEs to access any memory bank, there is a hardware arbiter between the memory requests generated by the PEs and the banks of the local memory. However, a fundamental restriction remains in that a bank cannot be accessed by two different PEs at the same time. We propose to meet this challenge by mapping application operations onto PEs and data into memory banks in a way that avoids such conflicts. To further improve performance on multibank memories, we propose a compiler optimization for CGRA mapping to reduce the number of memory operations by exploiting data reuse. Our experimental results on kernels from multimedia benchmarks demonstrate that our local memory-aware compilation approach can generate mappings that are up to 53% better in performance (26% on average) compared to a memory-unaware scheduler.