Multi-CMP system with data communication on the fly

Authors:
Marek Tudruj;Lukasz Masko;Miroslaw Thor
Affiliations:
Institute of Computer Science of the Polish Academy of Sciences, Warsaw, Poland 01-237 and Polish---Japanese Institute of Information Technology, Warsaw, Poland 02-008;Institute of Computer Science of the Polish Academy of Sciences, Warsaw, Poland 01-237;Telemark University College, Bo i Telemark, Norway 3800
Venue:
The Journal of Supercomputing
Year:
2011

Citing 16
Cited 0

Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Packetization and routing analysis of on-chip multiprocessor networks

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
SPIN: A Scalable, Packet Switched, On-Chip Micro-Network

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Dynamic SMP Clusters with Communication on the Fly in NoC Technology for Very Fine Grain Computations

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Towards Massively Parallel Numerical Computations Based on Dynamic SMP Clusters with Communication on the Fly

ISPDC '05 Proceedings of the The 4th International Symposium on Parallel and Distributed Computing
Fast Matrix Multiplication in Dynamic SMP Clusters with Communication on the Fly in Systems on Chip Technology

PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
Guest Editors' Introduction: On-Chip Interconnects for Multicores

IEEE Micro
Research Challenges for On-Chip Interconnection Networks

IEEE Micro
Fast exploration of bus-based communication architectures at the CCATB abstraction

ACM Transactions on Embedded Computing Systems (TECS)
Improving support for locality and fine-grain sharing in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Task Scheduling for SoC-Based Dynamic SMP Clusters with Communication on the Fly

ISPDC '08 Proceedings of the 2008 International Symposium on Parallel and Distributed Computing
Scheduling moldable tasks for dynamic SMP clusters in soc technology

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A NUCA Substrate for Flexible CMP Cache Sharing

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper concerns new communication solutions for hierarchical Chip Multiprocessor (CMP) systems composed of many CMP modules interconnected by a global data exchange network. New architectural solutions for internal module data communication are presented in the presence of hierarchical data caches in CMP modules. Inside CMP modules, dynamic shared memory core clusters are organized around L1---L2 data cache busses. Such clusters enable a group-oriented data communication based on reads on the fly to L1 banks of data present on the busses by many cores at a time. Dynamic switching of cores between such L1---L2 busses is done with porting data in core's L1 caches. Together with data reads on the fly, it provides a very efficient intercluster "communication on the fly," especially useful for transfers of strongly shared data. It provides fast cache to cache group data transmissions and eliminates standard transactions based on shared memory in the system. Comparative experimental results based on automatic scheduling of program data flow graphs and execution in a simulator of the proposed architecture evaluate the assumed architectural solutions. The multi-CMP system structure is assessed while taking into account technological limitations of the size of the single CMP module.