Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Journal of Computational Physics
Accurate flux vector splitting for shocks and shear layers
Journal of Computational Physics
ScaLAPACK user's guide
Compiler Techniques for Fine-Grain Execution on Workstation Clusters Using PAPERS
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Communication overhead for space science applications on the Beowulf parallel workstation
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
KLAT2's flat neighborhood network
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Compiler Techniques for Flat Neighborhood Networks
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
KLAT2's flat neighborhood network
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Hi-index | 0.00 |
Direct numerical simulation of the Navier-Stokes equations (DNS) is an important technique for the future of computational fluid dynamics (CFD) in engineering applications. However, DNS requires massive computing resources. This paper presents a new approach for implementing high-cost DNS CFD using low-cost cluster hardware. After describing the DNS CFD code DNSTool, the paper focuses on the techniques and tools that we have developed to customize the performance of a cluster implementation of this application. This tuning of system performance involves both recoding of the application and careful engineering of the cluster design. Using the cluster KLAT2 (Kentucky Linux Athlon Testbed 2), while DNSTool cannot match the $0.64 per Mflops that KLAT2 achieves on single precision ScaLAPACK, it is very efficient; DNST Tool on KLAT2 achieves price/performance of $2.75 per Mflops double precision and $1.86 single precision. Further, the code and tools are all, or will soon be, made freely available as full source code.