The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Routing in communications networks
Routing in communications networks
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
ASCOMA: An Adaptive Hybrid Shared Memory Architecture
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Parallel Scientific Computing in C++ and MPI
Parallel Scientific Computing in C++ and MPI
Implementation analysis of NoC: a MPSoC trace-driven approach
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Proceedings of the 44th annual Design Automation Conference
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor
ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
A case study for NoC-based homogeneous MPSoC architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient cache coherence protocol for NoC-based MPSoCs
Proceedings of the 24th symposium on Integrated circuits and systems design
UWB microwave imaging for breast cancer detection: Many-core, GPU, or FPGA?
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
The shared-memory model has been adopted, both for data exchange as well as synchronization using semaphores in almost every on-chip multiprocessor implementation, ranging from general purpose chip multiprocessors (CMPs) to domain specific multi-core graphics processing units (GPUs). Low-latency synchronization is desirable but is hard to achieve in practice due to the memory hierarchy. On the contrary, an explicit exchange of synchronization tokens among the processing elements through dedicated on-chip links would be beneficial for the overall system performance. In this paper we propose the Medea NoC-based framework, a hybrid shared-memory/message-passing approach. Medea has been modeled with a fast, cycle-accurate SystemC implementation enabling a fast system exploration varying several parameters like number and types of cores, cache size and policy and NoC features. In addition, every SystemC block has its RTL counterpart for physical implementation on FPGAs and ASICs. A parallel version of the Jacobi algorithm has been used as a test application to validate the metodology. Results confirm expectations about performance and effectiveness of system exploration and design.