An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Authors:
Choonki Jang;Jungwon Kim;Jaejin Lee;Hee-Seok Kim;Dong-Hoon Yoo;Sukjin Kim;Hong-Seok Kim;Soojung Ryu
Affiliations:
Seoul National University, Seoul, South Korea;Seoul National University, Seoul, South Korea;Seoul National University, Seoul, South Korea;University of Illinois at Urbana-Champaign, Urbana, IL, USA;Samsung Electronics, Giheung, South Korea;Samsung Electronics, Giheung, South Korea;Microsoft Corporation, Redmond, WA, USA;Samsung Electronics, Giheung, South Korea
Venue:
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Year:
2011

Citing 30
Cited 2

A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Data caches for superscalar processors

ICS '97 Proceedings of the 11th international conference on Supercomputing
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A locality sensitive multi-module cache with explicit management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Region-based caching: an energy-delay efficient memory architecture for embedded processors

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Storage Size Reduction by In-place Mapping of Arrays

VMCAI '02 Revised Papers from the Third International Workshop on Verification, Model Checking, and Abstract Interpretation
The MorphoSys Parallel Reconfigurable System

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Array Placement for Storage Size Reduction in Embedded Multimedia Systems

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
The Minimax Cache: An Energy-Efficient Framework for Media Processors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
Compilation techniques for energy reduction in horizontally partitioned cache architectures

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Data partitioning for maximal scratchpad usage

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Dynamic data scratchpad memory management for a memory subsystem with an MMU

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Recursive function data allocation to scratch-pad memory

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems

Function inlining and loop unrolling for loop acceleration in reconfigurable processors

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a data partitioning technique for the memory subsystem that consists of a multi-ported scratchpad memory (SPM) unit and a single-ported data cache in coarse-grained reconfigurable arrays (CGRA) architecture. The embedded reconfigurable processor executes programs by switching between the Non-VLIW and VLIW modes depending on the type of the code region to achieve high performance. The VLIW mode exploits code regions with high ILP that require high memory bandwidth and the Non-VLIW mode exploits those with low ILP that require low memory latency. Our data partitioning technique between the SPM and the data cache is based on data interference graph reduction and profiling information. Given an SPM size, it finds the optimal data partitions by taking the VLIW instruction schedule into consideration. We evaluate our data partitioning technique for the CGRA architecture with three representative multimedia applications.