The Stanford Dash Multiprocessor
Computer
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A recursive algorithm for low-power memory partitioning
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Improving the Efficiency of Memory Partitioning by Address Clustering
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Challenges in Embedded Memory Design and Test
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures
IEEE Transactions on Computers
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the Conference on Design, Automation and Test in Europe
A compiler-based approach for dynamically managing scratch-pad memories in embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.