Two algorithms for barrier synchronization
International Journal of Parallel Programming
ACM Transactions on Computer Systems (TOCS)
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Modeling the Communication Performance of the IBM SP2
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations
Mambo: a full system simulator for the PowerPC architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
The structural simulation toolkit: exploring novel architectures
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
SKaMPI: a comprehensive benchmark for public benchmarking of MPI
Scientific Programming
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
The PERCS High-Performance Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Bridging performance analysis tools and analytic performance modeling for HPC
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Netgauge: a network performance measurement framework
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Dataflow-driven GPU performance projection for multi-kernel transformations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Modeling synthetic aperture radar computation with Aspen
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The performance of parallel scientific applications depends on many factors which are determined by the execution environment and the parallel application. Especially on large parallel systems, it is too expensive to explore the solution space with series of experiments. Deriving analytical models for applications and platforms allow estimating and extrapolating their execution performance, bottlenecks, and the potential impact of optimization options. We propose to use such "performance modeling" techniques beginning from the application design process throughout the whole software development cycle and also during the lifetime of supercomputer systems. Such models help to guide supercomputer system design and re-engineering efforts to adopt applications to changing platforms and allow users to estimate costs to solve a particular problem. Models can often be built with the help of well-known performance profiling tools. We discuss how we successfully used modeling throughout the proposal, initial testing, and beginning deployment phase of the Blue Waters supercomputer system.