Configurable range memory for effective data reuse on programmable accelerators

Authors:
Jongeun Lee;Seongseok Seo;Jongkyung Paek;Kiyoung Choi
Affiliations:
UNIST, Ulsan, Korea;UNIST, Ulsan, Korea;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2014

Citing 26
Cited 0

A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
StrongARM: a high-performance ARM processor

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Memory resource management in VMware ESX server

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compilation techniques for energy reduction in horizontally partitioned cache architectures

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Reconfigurable split data caches: a novel scheme for embedded systems

Proceedings of the 2007 ACM symposium on Applied computing
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor

International Journal of Parallel Programming
A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

IEEE Micro
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)

Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)
PSMalloc: content based memory management for MPI applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Operation and data mapping for CGRAs with multi-bank memory

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Automatic CPU-GPU communication management and optimization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Efficient data streaming with on-chip accelerators: Opportunities and challenges

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Memory-centric communication architecture for reconfigurable computing

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

While programmable accelerators such as application-specific processors and reconfigurable architectures can dramatically speed up compute-intensive kernels of an application, application performance can still be severely limited by the communication between processors. To minimize the communication overhead, a shared memory such as a scratchpad memory may be employed between the main processor and the accelerator coprocessor. However, this setup poses a significant challenge to the main processor, which now must manage data on the scratchpad explicitly, resulting in superfluous data copying due to the inflexibility of a scratchpad. In this article, we present an enhancement of a scratchpad, Configurable Range Memory (CRM), whose address range can be reprogrammed to minimize unnecessary data copying between processors and therefore promote data reuse on the accelerator, and also present a software management algorithm for the CRM. Our experimental results involving detailed simulation of full multimedia applications demonstrate that our CRM architecture can reduce the communication overhead quite effectively, reducing the kernel execution time by up to 28% and the application runtime by up to 12.8%, in addition to considerable system energy reduction, compared to the conventional architecture based on a scratchpad.