Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Using static single assignment form to improve flow-insensitive pointer analysis
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Virtual memory window for application-specific reconfigurable coprocessors
Proceedings of the 41st annual Design Automation Conference
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Bridging the Processor-Memory Performance Gapwith 3D IC Technology
IEEE Design & Test
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
ACM Transactions on Embedded Computing Systems (TECS)
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Improving VLIW Processor Performance Using Three-Dimensional (3D) DRAM Stacking
ASAP '09 Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
Combining multicore and reconfigurable instruction set extensions
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FlexTiles: self adaptive heterogeneous manycore based on flexible tiles (FP7 project)
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique inserts allocation, free and data transfers into the code for using the memory layer and avoids memory overflows by adding a limited number of additional copies to and from the host memory. When compile-time information is lacking, the technique relies on run-time decisions for controlling these memory operations. Experiments show that, compared to a pessimistic approach, the overhead for avoiding overflows can be cut on average by 27%, 45% and 63% when the size of each memory unit is respectively 1kB, 128kB and 1MB.