Performance Improvement Methodology for ClearSpeed's CSX600

Authors:
Yuri Nishikawa;Michihiro Koibuchi;Masato Yoshimi;Kenichi Miura;Hideharu Amano
Affiliations:
Keio University, Japan;National Institute of Informatics, Japan;Keio University, Japan;National Institute of Informatics, Japan;Keio University, Japan
Venue:
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Year:
2007

Citing 0
Cited 2

Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Data-parallel techniques for simulating a mega-scale agent-based model of systemic inflammatory response syndrome on graphics processing units

Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on a performance of network-on-achip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC . The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.