A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture

Authors:
Ying Ping Zhang;Taikyeong Jeong;Fei Chen;Haiping Wu;Ronny Nitzsche;Guang R. Gao
Affiliations:
University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware;University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware;University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware;University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware;University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware;University of Delaware, Department of Electrical and Computer Engineering, Newark, Delaware
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 11
Cited 10

Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
A Single-Chip Multiprocessor

Computer
Probabilistic analysis of a crossbar switch

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Performance Portability on EARTH: A Case Study across Several Parallel Architectures

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

Proceedings of the 3rd conference on Computing frontiers
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture

HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
VLSI Performance Comparison of Banyan and Crossbar Communications Networks

IEEE Transactions on Computers

The Impact of Resource Sharing Control on the Design of Multicore Processors

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Design and implementation of high-speed buffered crossbars with efficient load balancing for multi-core SoCs

Microprocessors & Microsystems
SQUID: a practical 100% throughput scheduler for crosspoint buffered switches

IEEE/ACM Transactions on Networking (TON)
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Proceedings of the international conference on Supercomputing
Implementation of embedded system for intelligent image recognition and processing

ICCSA'06 Proceedings of the 6th international conference on Computational Science and Its Applications - Volume Part I
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Energy-guided exploration of on-chip network design for exa-scale computing

Proceedings of the International Workshop on System Level Interconnect Prediction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The designs of high-performance processor architectures are moving toward the integration of a large number of multiple processing cores on a single chip. The IBM Cyclops-64 (C64) is a petaflop supercomputer built on multi-core system-on-a-chip technology. Each C64 chip employs a multistage pipelined crossbar switch as its on-chip interconnection network to provide high bandwidth and low latency communication between the 160 thread processing cores, the on-chip SRAM memory banks, and other components. In this paper, we present a study of the architecture and performance of the C64 on-chip interconnection network through simulation. Our experimental results provide observations on the network behavior: (1) Dedicated channels can be created between any output port to input port of the C64 crossbar with latency as low as 7 cycles. The C64 crossbar has the potential reach the full hardware bandwidth, and exhibit a non-blocking behavior; (2) The C64 crossbar is a stable network; (3) The network logic design appears to provide a reasonable opportunity for sharing the channel bandwidth between traffic in either direction; (4) A simple circular neighbor arbitration scheme can achieve competitive performance level comparing to the complex segmented LRU (Least Recently Used) matrix arbitration scheme without losing the fairness. (5) Application-driven benchmarks provide comparable results to synthetic workloads.