LAPACK's user's guide
The torus-wrap mapping for dense matrix calculations on massively parallel computers
SIAM Journal on Scientific Computing
Matrix computations (3rd ed.)
Computer organization and design (2nd ed.): the hardware/software interface
Computer organization and design (2nd ed.): the hardware/software interface
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
IEEE Micro
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
Computational forces in the Linpack benchmark
Journal of Parallel and Distributed Computing
Programming the Linpack benchmark for Roadrunner
IBM Journal of Research and Development
Hi-index | 0.00 |
This paper gives a technical discussion of the Intel Pentium® Pro processor and optimization strategies used to achieve high performance on scientific applications. We demonstrate these optimizations by characterizing matrix multiplication (DGEMM). We give insight and a model into our efforts on obtaining the world's first TeraFLOP MP LINPACK run (on the Intel ASCI Option Red Supercomputer), based on Pentium Pro processor technology. The importance of this paper is carried by the increasing trend of commodity parts in the supercomputing arena.