On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems
Proceedings of the conference on Design, automation and test in Europe
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
As the size of FPGA devices grows following Moore's law, it becomes possible to put a complete manycore system onto a single FPGA chip. The centralized memory hierarchy on typical embedded systems in which both data and instructions are stored in the off-chip global memory will introduce the bus contention problem as the number of processing cores increases. In this work, we present our exploration into how distributed multi-tiered memory hierarchies can effect the scalability of manycore systems. We use the Xilinx Virtex FPGA devices as the testing platforms and the buses as the interconnect. Several variances of the centralized memory hierarchy and the distributed memory hierarchy are compared by running various benchmarks, including matrix multiplication, IDEA encryption and 3D FFT. The results demonstrate the good scalability of the distributed memory hierarchy for systems up to 32 MicroBlaze processors, which is constrained by the FPGA resources on the Virtex-6LX240T device.