Transforming FORTRAN DO loops to improve performance on vector architectures
ACM Transactions on Mathematical Software (TOMS)
FFTs in external or hierarchical memory
The Journal of Supercomputing
Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
What's next in high-performance computing?
Communications of the ACM - Ontology: different ways of representing the same concept
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Early Evaluation of the Cray X1
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A performance evaluation of the cray x1 for scientific applications
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Performance Analysis of Leading HPC Architectures With Beambeam3D
International Journal of High Performance Computing Applications
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance tuning and analysis of future vector processors based on the roofline model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance and scalability analysis of cray x1 vectorization and multistreaming optimization
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Implications of memory performance for highly efficient supercomputing of scientific applications
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
During the last decade the scientific computing community has optimized many applications for execution on superscalar computing platforms. The recent arrival of the Japanese Earth Simulator has revived interest in vector architectures especially in the US. It is important to examine how to port our current scientific applications to the new vector platforms and how to achieve high performance. The success of porting these applications will also influence the acceptance of new vector architectures. In this paper, we first investigate the memory performance characteristics of the Cray X1, a recently released vector platform, and determine the most influential performance factors. Then, we examine how to optimize applications tuned on superscalar platforms for the Cray X1 using its performance characteristics as guidelines. Finally, we evaluate the different types of optimizations used, the effort for their implementations, and whether they provide any performance benefits when ported back to superscalar platforms.