Computational forces in the SAGE benchmark

Authors:
Robert W. Numrich
Affiliations:
Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 14
Cited 2

Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
The effect of time constraints on scaled speedup

SIAM Journal on Scientific and Statistical Computing
The design of a scalable, fixed-time computer benchmark

Journal of Parallel and Distributed Computing
Compute intensity and the FFT

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Exploring Advanced Architectures Using Performance Prediction

IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Performance Evaluation of an Alpha EV7 Processing Node

International Journal of High Performance Computing Applications
A performance comparison through benchmarking and modeling of three leading supercomputers: blue Gene/L, Red Storm, and Purple

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A note on scaling the Linpack benchmark

Journal of Parallel and Distributed Computing
Computational forces in the Linpack benchmark

Journal of Parallel and Distributed Computing
Paper: Performance parameters and benchmarking of supercomputers

Parallel Computing
Dimensional analysis applied to a parallel QR algorithm

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics

Self-similarity of parallel machines

Parallel Computing
Computer performance analysis and the Pi Theorem

Computer Science - Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dimensional analysis applied to a complicated timing formula for the SAGE benchmark yields new insight into the limits to scalability. A single surface, defined by two curvilinear coordinates, describes the parallel efficiency of the benchmark. Each machine, as a function of the number of processors, follows its own path on the surface determined by dimensionless ratios of hardware forces to software forces. Two machines with the same ratios follow the same path and are self-similar, even though the numerical value of each individual force may be different. For this benchmark, latency effects are unimportant relative to bandwidth effects because of the slab decomposition used to distribute the problem across processors. To a good first-order approximation, a single force ratio describes the efficiency as a function of the number of processors. A simpler model, with a single dimensionless exponent, describes the first-order behavior of the computational power as a function of the number of processors.