Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Authors:
Lovic Gauthier;Shinya Ueno;Koji Inoue
Affiliations:
Kyushu University, Fukuoka, Japan;Kyushu University, Fukuoka, Japan;Kyushu University, Fukuoka, Japan
Venue:
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Year:
2013

Citing 25
Cited 0

Points-to analysis in almost linear time

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Using static single assignment form to improve flow-insensitive pointer analysis

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Virtual memory window for application-specific reconfigurable coprocessors

Proceedings of the 41st annual Design Automation Conference
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Low Energy Data Management for Different On-Chip Memory Levels in Multi-Context Reconfigurable Architectures

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Bridging the Processor-Memory Performance Gapwith 3D IC Technology

IEEE Design & Test
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Efficient dynamic heap allocation of scratch-pad memory

Proceedings of the 7th international symposium on Memory management
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

ACM Transactions on Embedded Computing Systems (TECS)
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Improving VLIW Processor Performance Using Three-Dimensional (3D) DRAM Stacking

ASAP '09 Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
Combining multicore and reconfigurable instruction set extensions

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

Proceedings of the 8th ACM International Conference on Computing Frontiers
CudaDMA: optimizing GPU memory bandwidth via warp specialization

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FlexTiles: self adaptive heterogeneous manycore based on flexible tiles (FP7 project)

Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Constraint Programming Approach to Reconfigurable Processor Extension Generation and Application Compilation

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications

Proceedings of the 49th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique inserts allocation, free and data transfers into the code for using the memory layer and avoids memory overflows by adding a limited number of additional copies to and from the host memory. When compile-time information is lacking, the technique relies on run-time decisions for controlling these memory operations. Experiments show that, compared to a pessimistic approach, the overhead for avoiding overflows can be cut on average by 27%, 45% and 63% when the size of each memory unit is respectively 1kB, 128kB and 1MB.