Computer performance analysis and the Pi Theorem

Authors:
Robert W. Numrich
Affiliations:
Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, USA
Venue:
Computer Science - Research and Development
Year:
2014

Citing 33
Cited 0

Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio

IEEE Transactions on Computers
The effect of time constraints on scaled speedup

SIAM Journal on Scientific and Statistical Computing
The design of a scalable, fixed-time computer benchmark

Journal of Parallel and Distributed Computing
Synthetic Traces for Trace-Driven Simulation of Cache Memories

IEEE Transactions on Computers
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches

IEEE Transactions on Computers
Memory contention for shared memory vector multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Compute intensity and the FFT

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Public international benchmarks for parallel computers: PARKBENCH committee: Report-1

Scientific Programming
Cache Memories

ACM Computing Surveys (CSUR)
Parallel Computers 2: Architecture, Programming, and Algorithms

Parallel Computers 2: Architecture, Programming, and Algorithms
Measurement of Communication Rates on the Cray T3D Interprocessor Network

HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume II: Networking and Tools
Statistical analysis of NAS parallel benchmarks and LINPACK results

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Statistical Performance Modeling: Case Study of the NPB 2.1 Results

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Selected Results from the ParkBench Benchmark

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Improving the computational intensity of unstructured mesh applications

Proceedings of the 19th annual international conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cache miss behavior: is it √2?

Proceedings of the 3rd conference on Computing frontiers
A metric space for productivity measurement in software development

Proceedings of the second international workshop on Software engineering for high performance computing system applications
Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax

Parallel Computing
A note on scaling the Linpack benchmark

Journal of Parallel and Distributed Computing
On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications

IEEE Transactions on Computers
Cray XT4: an early evaluation for petascale scientific simulation

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Computational forces in the Linpack benchmark

Journal of Parallel and Distributed Computing
Computational forces in the SAGE benchmark

Journal of Parallel and Distributed Computing
Evaluation techniques for storage hierarchies

IBM Systems Journal
Paper: Performance parameters and benchmarking of supercomputers

Parallel Computing
Dimensional analysis applied to a parallel QR algorithm

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper applies the Pi Theorem of dimensional analysis to a representative set of examples from computer performance analysis. It is a survey paper that takes a different look at problems involving latency, bandwidth, cache-miss ratios, and the efficiency of parallel numerical algorithms. The Pi Theorem is the fundamental tool of dimensional analysis, and it applies to problems in computer performance analysis just as well as it does to problems in other sciences. Applying it requires the definition of a system of measurement appropriate for computer performance analysis with a consistent set of units and dimensions. Then a straightforward recipe for each specific problem reduces the number of independent variables to a smaller number of dimensionless parameters. Two machines with the same values of these parameters are self-similar and behave the same way. Self-similarity relationships emphasize how machines are the same rather than how they are different. The Pi Theorem is simple to state and simple to prove, using purely algebraic methods, but the results that follow from it are often surprising and not simple at all. The results are often unexpected but they almost always reveal something new about the problem at hand.