Gyrokinetic particle simulation model
Journal of Computational Physics
Early Evaluation of the Cray X1
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A performance evaluation of the cray x1 for scientific applications
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
COTS Clusters vs. the Earth Simulator: An Application Study Using IMPACT-3D
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Leading Computational Methods on Scalar and Vector HEC Platforms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A case study in top-down performance estimation for a large-scale parallel application
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
IEEE Transactions on Parallel and Distributed Systems
The memory behavior of cache oblivious stencil computations
The Journal of Supercomputing
IEEE Transactions on Computers
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
An on-chip cache design for vector processors
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance Analysis of Leading HPC Architectures With Beambeam3D
International Journal of High Performance Computing Applications
Scientific Application Performance On Leading Scalar and Vector Supercomputering Platforms
International Journal of High Performance Computing Applications
BXSA for fast processing of scientific data
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Block-Based Approach to Solving Linear Systems
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
A shared cache for a chip multi vector processor
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance evaluation of NEC SX-9 using real science and engineering applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Performance evaluation of scientific applications on modern parallel vector systems
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance improvement of sparse matrix vector product on vector machines
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Implications of memory performance for highly efficient supercomputing of scientific applications
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Speculative parallelization: eliminating the overhead of failure
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Computational scientists have seen a frustrating trend of stagnating application performance despite dramatic increases in the claimed peak capability of high performance computing systems. This trend has been widely attributed to the use of superscalar-based commodity components whoýs architectural designs offer a balance between memory performance, network capability, and execution rate that is poorly matched to the requirements of large-scale numerical computations. Recently, two innovative parallel-vector architectures have become operational: the Japanese Earth Simulator (ES) and the Cray X1. In order to quantify what these modern vector capabilities entail for the scientists that rely on modeling and simulation, it is critical to evaluate this architectural paradigm in the context of demanding computational algorithms. Our evaluation study examines four diverse scientific applications with the potential to run at ultrascale, from the areas of plasma physics, material science, astrophysics, and magnetic fusion. We compare performance between the vector-based ES and X1, with leading superscalar-based platforms: the IBM Power3/4 and the SGI Altix. Our research team was the first international group to conduct a performance evaluation study at the Earth Simulator Center; remote ES access in not available. Results demonstrate that the vector systems achieve excellent performance on our application suite - the highest of any architecture tested to date. However, vectorization of a particle-in-cell code highlights the potential difficulty of expressing irregularly structured algorithms as data-parallel programs.