A NoC-based hybrid message-passing/shared-memory approach to CMP design

Authors:
Mario R. Casu;Massimo Ruo Roch;Sergio V. Tota;Maurizio Zamboni
Affiliations:
Politecnico di Torino, Dipartimento di Elettronica, C.so Duca degli Abruzzi 24, I-10129 Torino, Italy;Politecnico di Torino, Dipartimento di Elettronica, C.so Duca degli Abruzzi 24, I-10129 Torino, Italy;Imagination Technologies, Home Park Estate, Kings Langley, Hertfordshire WD4 8LZ, United Kingdom;Politecnico di Torino, Dipartimento di Elettronica, C.so Duca degli Abruzzi 24, I-10129 Torino, Italy
Venue:
Microprocessors & Microsystems
Year:
2011

Citing 25
Cited 3

Marching cubes: A high resolution 3D surface construction algorithm

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Routing in communications networks

Routing in communications networks
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
The Design of Rijndael

The Design of Rijndael
A Scalable High-Performance Computing Solution for Networks on Chips

IEEE Micro
ASCOMA: An Adaptive Hybrid Shared Memory Architecture

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Parallel Scientific Computing in C++ and MPI

Parallel Scientific Computing in C++ and MPI
A Network on Chip Architecture and Design Methodology

ISVLSI '02 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Implementation analysis of NoC: a MPSoC trace-driven approach

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Evaluation of on-chip networks using deflection routing

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
The kill rule for multicore

Proceedings of the 44th annual Design Automation Conference
An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models

DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Research Challenges for On-Chip Interconnection Networks

IEEE Micro
Exploring High-Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor

ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
A case for bufferless routing in on-chip networks

Proceedings of the 36th annual international symposium on Computer architecture
A case study for NoC-based homogeneous MPSoC architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Hardware acceleration of beamforming in a UWB imaging unit for breast cancer detection

VLSI Design
UWB microwave imaging for breast cancer detection: Many-core, GPU, or FPGA?

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Architecture, performance modeling and VLSI implementation methodologies for ASIC vector processors: A case study in telephony workloads

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Future chip-multiprocessors (CMP) will integrate many cores interconnected with a high-bandwidth and low-latency scalable network-on-chip (NoC). However, the potential that this approach offers at the transport level needs to be paired with an analogous paradigm shift at the higher levels. In particular, the standard shared-memory programming model fails to address the requirements of scalability of the many-core era. Fast data exchange among the cores and low-latency synchronization are desirable but hard to achieve in practice due to the memory hierarchy. The message-passing paradigm permits instead direct data communication and synchronization between the cores. The shared-memory could still be used for the instruction fetch. Hence, we propose a hybrid approach that combines shared-memory and message passing in a single general-purpose CMP architecture that allows efficient execution of applications developed with both parallel programming approaches. Cores fetch instructions from a hierarchical memory and exchange their data through the same memory, for compatibility with existing software, or directly through the fast NoC. We developed a fast SystemC based cycle-accurate simulator for design space explorations that we used to evaluate the performance with real benchmarks. The various components have been RTL coded and mapped to a CMOS 45nm technology to build a silicon area model that we used to select the best architectural configurations.