Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Computers
The effect of time constraints on scaled speedup
SIAM Journal on Scientific and Statistical Computing
The design of a scalable, fixed-time computer benchmark
Journal of Parallel and Distributed Computing
Synthetic Traces for Trace-Driven Simulation of Cache Memories
IEEE Transactions on Computers
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Memory contention for shared memory vector multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scalable load balancing techniques for parallel computers
Journal of Parallel and Distributed Computing
Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Public international benchmarks for parallel computers: PARKBENCH committee: Report-1
Scientific Programming
ACM Computing Surveys (CSUR)
Parallel Computers 2: Architecture, Programming, and Algorithms
Parallel Computers 2: Architecture, Programming, and Algorithms
Measurement of Communication Rates on the Cray T3D Interprocessor Network
HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume II: Networking and Tools
Statistical analysis of NAS parallel benchmarks and LINPACK results
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Statistical Performance Modeling: Case Study of the NPB 2.1 Results
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Selected Results from the ParkBench Benchmark
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cache miss behavior: is it √2?
Proceedings of the 3rd conference on Computing frontiers
A metric space for productivity measurement in software development
Proceedings of the second international workshop on Software engineering for high performance computing system applications
A note on scaling the Linpack benchmark
Journal of Parallel and Distributed Computing
IEEE Transactions on Computers
Cray XT4: an early evaluation for petascale scientific simulation
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Computational forces in the Linpack benchmark
Journal of Parallel and Distributed Computing
Computational forces in the SAGE benchmark
Journal of Parallel and Distributed Computing
Evaluation techniques for storage hierarchies
IBM Systems Journal
Paper: Performance parameters and benchmarking of supercomputers
Parallel Computing
Dimensional analysis applied to a parallel QR algorithm
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines
Parallel Computing
Hi-index | 0.00 |
This paper applies the Pi Theorem of dimensional analysis to a representative set of examples from computer performance analysis. It is a survey paper that takes a different look at problems involving latency, bandwidth, cache-miss ratios, and the efficiency of parallel numerical algorithms. The Pi Theorem is the fundamental tool of dimensional analysis, and it applies to problems in computer performance analysis just as well as it does to problems in other sciences. Applying it requires the definition of a system of measurement appropriate for computer performance analysis with a consistent set of units and dimensions. Then a straightforward recipe for each specific problem reduces the number of independent variables to a smaller number of dimensionless parameters. Two machines with the same values of these parameters are self-similar and behave the same way. Self-similarity relationships emphasize how machines are the same rather than how they are different. The Pi Theorem is simple to state and simple to prove, using purely algebraic methods, but the results that follow from it are often surprising and not simple at all. The results are often unexpected but they almost always reveal something new about the problem at hand.