The effect of time constraints on scaled speedup
SIAM Journal on Scientific and Statistical Computing
The design of a scalable, fixed-time computer benchmark
Journal of Parallel and Distributed Computing
Memory contention for shared memory vector multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Scalable load balancing techniques for parallel computers
Journal of Parallel and Distributed Computing
Scalability issues affecting the design of a dense linear algebra library
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Massively Parallel Linpack Benchmark on the Intel Touchstone Delta andIPSC/860 Systems (Progress Report)
A parallel numerical library for co-array fortran
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A metric space for computer programs and the principle of computational least action
The Journal of Supercomputing
Co-arrays in the next Fortran Standard
Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Computational forces in the Linpack benchmark
Journal of Parallel and Distributed Computing
Computational forces in the SAGE benchmark
Journal of Parallel and Distributed Computing
Dimensional analysis applied to a parallel QR algorithm
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines
Parallel Computing
An early comparison of commercial and open-source cloud platforms for scientific environments
KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
Hi-index | 0.00 |
Dimensional analysis yields a new scaling formula for the Linpack benchmark. The computational power r(p"0,q"0) on a set of processors decomposed into a (p"0,q"0) grid determines the computational power r(p,q) on a set of processors decomposed into a (p,q) grid by the formula r(p,q)=(p/p"0)^@a(q/q"0)^@br(p"0,q"0). The two scaling parameters @a and @b measure interprocessor communication overhead required by the algorithm. A machine that scales perfectly corresponds to @a=@b=1; a machine that scales not at all corresponds to @a=@b=0. We have determined the two scaling parameters by imposing a fixed-time constraint on the problem size such that the execution time remains constant as the number of processors changes. Results for a collection of machines confirm that the formula suggested by dimensional analysis is correct. Machines with the same values for these parameters are self-similar. They scale the same way even though the details of their specific hardware and software may be quite different.