On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

Authors:
Thomas William Ainsworth;Timothy Mark Pinkston
Affiliations:
University of Southern California Los Angeles, USA;University of Southern California Los Angeles, USA
Venue:
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Year:
2007

Citing 24
Cited 12

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A generic architecture for on-chip packet-switched interconnections

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs

Proceedings of the 38th annual Design Automation Conference
Addressing the system-on-a-chip interconnect woes through communication-based design

Proceedings of the 38th annual Design Automation Conference
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Networks on Chips: A New SoC Paradigm

Computer
AMBA: Enabling Reusable On-Chip Designs

IEEE Micro
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Hierarchical Interconnects for On-Chip Clustering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
aSOC: A Scalable, Single-Chip Communications Architecture

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21364 Network Architecture

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
A Network on Chip Architecture and Design Methodology

ISVLSI '02 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Packetized On-Chip Interconnect Communication Analysis for MPSoC

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
SPIN: A Scalable, Packet Switched, On-Chip Micro-Network

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Scalar Operand Networks

IEEE Transactions on Parallel and Distributed Systems
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Trends toward on-chip networked microsystems

International Journal of High Performance Computing and Networking
POWER4 system microarchitecture

IBM Journal of Research and Development

On-Chip Interconnection Networks of the TRIPS Chip

IEEE Micro
SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
CellSs: Scheduling techniques to better exploit memory hierarchy

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Vector stream processing for effective application of heterogeneous parallelism

Proceedings of the 2009 ACM symposium on Applied Computing
Exploiting Locality on the Cell/B.E. through Bypassing

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Modeling advanced collective communication algorithms on cell-based systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Network interface design based on mutual interface definition

International Journal of High Performance Systems Architecture
On implementing motion-based Region of Interest detection on multi-core CELL

Computer Vision and Image Understanding
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E

International Journal of High Performance Computing Applications
Scalable heterogeneous parallelism for atmospheric modeling and simulation

The Journal of Supercomputing
Performance impact of task mapping on the cell BE multicore processor

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rise of multicore computing, the design of on-chip networks (or networks on chip) has become an increasingly important component of computer architecture. The Cell Broadband Engine's Element Interconnect Bus (EIB), with its four data rings and shared command bus for end-to-end control, supports twelve nodes--more than most mainstream on-chip networks, which makes it an interesting case study. As a first step toward understanding the design and performance of on-chip networks implemented within the context of a commercial multicore chip, this paper analytically evaluates the EIB network using conventional latency and throughput characterization methods as well as using a recently proposed 5-tuple latency characterization model for on-chip networks. These are used to identify the end-to-end control component of the EIB (i.e., the shared command bus) as being the main bottleneck to achieving minimal, single-cycle latency and maximal 307.2 GB/sec raw effective bandwidth provided natively by the EIB. This can be exacerbated by poorly designed Cell software, which can have significant impact on the utilization of the EIB. The main findings from this study are that the end-to-end control of the EIB influenced by software running on the Cell has inherent scaling problems and serves as the main limiter to overall network performance. Thus, end-to-end effects must not be overlooked when designing efficient networks on chip.