Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
The effect of time constraints on scaled speedup
SIAM Journal on Scientific and Statistical Computing
The design of a scalable, fixed-time computer benchmark
Journal of Parallel and Distributed Computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scalable load balancing techniques for parallel computers
Journal of Parallel and Distributed Computing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Exploring Advanced Architectures Using Performance Prediction
IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Performance Evaluation of an Alpha EV7 Processing Node
International Journal of High Performance Computing Applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A note on scaling the Linpack benchmark
Journal of Parallel and Distributed Computing
Computational forces in the Linpack benchmark
Journal of Parallel and Distributed Computing
Paper: Performance parameters and benchmarking of supercomputers
Parallel Computing
Dimensional analysis applied to a parallel QR algorithm
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines
Parallel Computing
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
Hi-index | 0.00 |
Dimensional analysis applied to a complicated timing formula for the SAGE benchmark yields new insight into the limits to scalability. A single surface, defined by two curvilinear coordinates, describes the parallel efficiency of the benchmark. Each machine, as a function of the number of processors, follows its own path on the surface determined by dimensionless ratios of hardware forces to software forces. Two machines with the same ratios follow the same path and are self-similar, even though the numerical value of each individual force may be different. For this benchmark, latency effects are unimportant relative to bandwidth effects because of the slab decomposition used to distribute the problem across processors. To a good first-order approximation, a single force ratio describes the efficiency as a function of the number of processors. A simpler model, with a single dimensionless exponent, describes the first-order behavior of the computational power as a function of the number of processors.