Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

Authors:
Leonid Oliker;Andrew Canning;Jonathan Carter;John Shalf;David Skinner;Ethier Ethier;Rupak Biswas;Jahed Djomehri;Rob Van der Wijngaart
Affiliations:
CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;Princeton University, NJ;NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 6
Cited 16

Gyrokinetic particle simulation model

Journal of Computational Physics
Solving Einstein's Equations on Supercomputers

Computer
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
16.4-Tflops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance enhancement strategies for multi-block overset grid CFD applications

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing

Evaluating support for global address space languages on the Cray X1

Proceedings of the 18th annual international conference on Supercomputing
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Fast Parallel Non-Contiguous File Access

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
An on-chip cache design for vector processors

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimization Strategies Using Hybrid MPI+OpenMP Parallelization for Large-Scale Data Visualization on Earth Simulator

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
A shared cache for a chip multi vector processor

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance evaluation of NEC SX-9 using real science and engineering applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Edgepack: a parallel vertex and node reordering package for optimizing edge-based computations in unstructured grids

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Performance evaluation of scientific applications on modern parallel vector systems

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A performance evaluation of the cray x1 for scientific applications

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Performance characteristics of a cosmology package on leading HPC architectures

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Implications of memory performance for highly efficient supercomputing of scientific applications

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Optimization of geometric multigrid for emerging multi- and manycore processors

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in comput-ing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly outperforms the cache-based architectures. However, certain applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.