ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A generic architecture for on-chip packet-switched interconnections
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs
Proceedings of the 38th annual Design Automation Conference
Addressing the system-on-a-chip interconnect woes through communication-based design
Proceedings of the 38th annual Design Automation Conference
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
AMBA: Enabling Reusable On-Chip Designs
IEEE Micro
Hierarchical Interconnects for On-Chip Clustering
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
aSOC: A Scalable, Single-Chip Communications Architecture
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21364 Network Architecture
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
A Network on Chip Architecture and Design Methodology
ISVLSI '02 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Packetized On-Chip Interconnect Communication Analysis for MPSoC
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
SPIN: A Scalable, Packet Switched, On-Chip Micro-Network
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
IEEE Transactions on Parallel and Distributed Systems
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
POWER4 system microarchitecture
IBM Journal of Research and Development
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Exploiting Locality on the Cell/B.E. through Bypassing
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Network interface design based on mutual interface definition
International Journal of High Performance Systems Architecture
On implementing motion-based Region of Interest detection on multi-core CELL
Computer Vision and Image Understanding
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E
International Journal of High Performance Computing Applications
Scalable heterogeneous parallelism for atmospheric modeling and simulation
The Journal of Supercomputing
Performance impact of task mapping on the cell BE multicore processor
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
With the rise of multicore computing, the design of on-chip networks (or networks on chip) has become an increasingly important component of computer architecture. The Cell Broadband Engine's Element Interconnect Bus (EIB), with its four data rings and shared command bus for end-to-end control, supports twelve nodes--more than most mainstream on-chip networks, which makes it an interesting case study. As a first step toward understanding the design and performance of on-chip networks implemented within the context of a commercial multicore chip, this paper analytically evaluates the EIB network using conventional latency and throughput characterization methods as well as using a recently proposed 5-tuple latency characterization model for on-chip networks. These are used to identify the end-to-end control component of the EIB (i.e., the shared command bus) as being the main bottleneck to achieving minimal, single-cycle latency and maximal 307.2 GB/sec raw effective bandwidth provided natively by the EIB. This can be exacerbated by poorly designed Cell software, which can have significant impact on the utilization of the EIB. The main findings from this study are that the end-to-end control of the EIB influenced by software running on the Cell has inherent scaling problems and serves as the main limiter to overall network performance. Thus, end-to-end effects must not be overlooked when designing efficient networks on chip.