ACM SIGOPS Operating Systems Review
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Latin Squares for Parallel Array Access
IEEE Transactions on Parallel and Distributed Systems
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
An open-loop flow control scheme based on the accurate global information of on-chip communication
Proceedings of the conference on Design, automation and test in Europe
An SDRAM-aware router for Networks-on-Chip
Proceedings of the 46th Annual Design Automation Conference
A Low-Latency and Memory-Efficient On-chip Network
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A Network Congestion-Aware Memory Controller
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the Conference on Design, Automation and Test in Europe
An SDRAM-aware router for networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Hierarchical memory scheduling for multimedia MPSoCs
Proceedings of the International Conference on Computer-Aided Design
CARS: congestion-aware request scheduler for network interfaces in NoC-based manycore systems
Proceedings of the Conference on Design, Automation and Test in Europe
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Hi-index | 0.00 |
3D stacked memory enables more off-chip DDR memories. Redesigning existing IPs to exploit the increased memory parallelism will be prohibitively costly. In our work, we propose a practical approach to exploit the increased bandwidth and reduced latency of multiple off-chip DDR memories while reusing existing IPs without modification. The proposed approach is based on two new concepts: transaction id renaming and distributed soft arbitration. We present two on-chip network components, request parallelizer and read data serializer, to realize the concepts. Experiments with synthetic test cases and an industrial strength DTV SoC design show that the proposed approach gives significant improvements in total execution cycle (21.6%) and average memory access latency (31.6%) in the DTV case with a small area overhead (30.1% in the on-chip network, and less than 1.4% in the entire chip).