Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Advanced data-parallel compilation
Advanced data-parallel compilation
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hi-index | 0.01 |
In 2002, Japan announced the Earth Simulator-a supercomputer based on low-volume vector processors and a custom network-and reported that computational scientists had used it to achieve 14.9 TFLOPS with the IMPACT-3D code, which is written in High Performance Fortran (HPF). Of particular interest was that they had achieved this level of performance using a high-level parallel programming model. There has been considerable concern in the U.S. about the appropriateness of its hardware and software investments in supercomputing technology. To help assess the U.S. strategy of building systems from commodity-off-the-shelf (COTS) components, we explored using a combination of HPF and scalar compiler technology to tailor IMPACT-3D to microprocessor-based supercomputers and evaluated its performance and scalability on the AlphaServer-based Lemieux cluster at the Pittsburgh Supercomputer Center (PSC). On the Earth Simulator, IMPACT-3D achieved 45% of peak performance on 4096 processors; on 1024 processors of PSC's Lemieux, we achieved 17.29% of peak performance.