Implications of memory performance for highly efficient supercomputing of scientific applications

Authors:
Akihiro Musa;Hiroyuki Takizawa;Koki Okabe;Takashi Soga;Hiroaki Kobayashi
Affiliations:
Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan;NEC System Tecnologies, Osaka, Japan;Tohoku University, Sendai, Japan
Venue:
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Year:
2006

Citing 7
Cited 2

Vector performance analysis of three supercomputers: Cray 2, Cray Y-MP, and ETA 10-Q

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Vector performance analysis of the NEC SX-2

ICS '90 Proceedings of the 4th international conference on Supercomputing
16.4-Tflops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance characteristics of the Cray X1 and their implications for application performance tuning

Proceedings of the 18th annual international conference on Supercomputing
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

An on-chip cache design for vector processors

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
A shared cache for a chip multi vector processor

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the memory performance of the vector-parallel and scalar-parallel computing platforms across five applications of three scientific areas; electromagnetic analysis, CFD/heat analysis, and seismology. Our evaluation results show that the vector platforms can achieve the high computational efficiency and hence significantly outperform the scalar platforms in the areas of these applications. We did exhaustive experiments and quantitatively evaluated representative scalar and vector platforms using real applications from the viewpoint of the system designers and developers. These results demonstrate that the ratio of memory bandwidth to floating-point operation rate needs to reach 4-bytes/flop to preserve the computational performance with hiding the memory access latencies by pipelined vector operations in the vector platforms. We also confirm that the enough number of memory banks to handle stride memory accesses leads to an increase in the execution efficiency. On the scalar platforms, the cache hit rate needs to be almost 100% to achieve the high computational efficiency.