An analytical network performance model for SIMD processor CSX600 interconnects

Authors:
Yuri Nishikawa;Michihiro Koibuchi;Masato Yoshimi;Kenichi Miura;Hideharu Amano
Affiliations:
Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi Kouhoku-ku Yokohama, Kanagawa 223-8522, Japan;National Institute of Informatics, 2-1-2 Hitotsubashi Chiyoda-ku, Tokyo 101-8430, Japan;Doshisha University, 1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto 610-0321, Japan;National Institute of Informatics, 2-1-2 Hitotsubashi Chiyoda-ku, Tokyo 101-8430, Japan;Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi Kouhoku-ku Yokohama, Kanagawa 223-8522, Japan
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2011

Citing 16
Cited 0

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Simics: A Full System Simulation Platform

Computer
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
A Power and Performance Model for Network-on-Chip Architectures

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Communication architecture optimization: making the shortest path shorter in regular networks-on-chip

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Characterizing the Cell EIB On-Chip Network

IEEE Micro
On-Chip Interconnection Networks of the TRIPS Chip

IEEE Micro
Architecture of the Scalable Communications Core's Network on Chip

IEEE Micro
GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A comprehensive power-performance model for NoCs with multi-flit channel buffers

Proceedings of the 23rd international conference on Supercomputing
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
An Analytical Performance Evaluation for WSNs Using Loop-Free Bellman Ford Protocol

AINA '09 Proceedings of the 2009 International Conference on Advanced Information Networking and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.