The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Networks on chip
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Challenges in Embedded Memory Design and Test
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures
IEEE Transactions on Computers
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Microprogramming: A Tutorial and Survey of Recent Developments
IEEE Transactions on Computers
Proceedings of the Third International Workshop on Network on Chip Architectures
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Computers and Electrical Engineering
A divide and conquer based distributed run-time mapping methodology for many-core platforms
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Power-aware dynamic memory management on many-core platforms utilizing DVFS
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes
Microprocessors & Microsystems
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Supporting Distributed Shared Memory (DSM) is essential for multi-core Network-on-Chips for the sake of reusing huge amount of legacy code and easy programmability. We propose a microcoded controller as a hardware module in each node to connect the core, the local memory and the network. The controller is programmable where the DSM functions such as virtual-to-physical address translation, memory access and synchronization etc. are realized using microcode. To enable concurrent processing of memory requests from the local and remote cores, our controller features two mini-processors, one dealing with requests from the local core and the other from remote cores. Synthesis results suggest that the controller consumes 51k gates for the logic and can run up to 455 MHz in 130 nm technology. To evaluate its performance, we use synthetic and application workloads. Results show that, when the system size is scaled up, the delay overhead incurred by the controller may become less significant when compared with the network delay. In this way, the delay efficiency of our DSM solution is close to hardware solutions on average but still have all the flexibility of software solutions.