Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches
IEEE Transactions on Computers
MorphoSys: case study of a reconfigurable computing system targeting multimedia applications
Proceedings of the 37th Annual Design Automation Conference
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
An algorithm for mapping loops onto coarse-grained reconfigurable architectures
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Area efficient layouts of binary trees in grids
Area efficient layouts of binary trees in grids
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Alleviating the Data Memory Bandwidth Bottleneck in Coarse-Grained Reconfigurable Arrays
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Memory-Aware application mapping on coarse-grained reconfigurable arrays
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Vector class on limited local memory (LLM) multi-core processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Configurable range memory for effective data reuse on programmable accelerators
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Coarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the form of data mapping. For reasons of power-efficiency and complexity, CGRAs use multi-bank local memory, and a row of PEs share memory access. In order for each row of the PEs to access any memory bank, there is a hardware arbiter between the memory requests generated by the PEs and the banks of the local memory. However, a fundamental restriction remains that a bank cannot be accessed by two different PEs at the same time. We propose to meet this challenge by mapping application operations onto PEs and data into memory banks in a way that avoids such conflicts. Our experimental results on kernels from multimedia benchmarks demonstrate that our local memory-aware compilation approach can generate mappings that are up to 40% better in performance (17.3% on average) compared to a memory-unaware scheduler.