Parallel Algorithms for the Spectral Transform Method
SIAM Journal on Scientific Computing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance evaluation of the IBM SP and the Compaq AlphaServer SC
Proceedings of the 14th international conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Conventional Benchmarks as a Sample of the Performance Spectrum
The Journal of Supercomputing
Accurate Performance Prediction for Assively Parallel Systems and Its Applications
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Integrated Compilation and Scalability Analysis for Parallel Systems
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Modeling application performance by convolving machine signatures with application profiles
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Self-adapting software for numerical linear algebra and LAPACK for clusters
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Using Information from Prior Runs to Improve Automated Tuning Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Automatic Construction and Evaluation of Performance Skeletons
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A Framework for Measuring Supercomputer Productivity
International Journal of High Performance Computing Applications
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ALITER: an asynchronous lightweight instrumentation tool for event recording
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Low cost trace-driven memory simulation using SimPoint
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
A performance prediction framework for scientific applications
Future Generation Computer Systems
Performance feature identification by comparative trace analysis
Future Generation Computer Systems
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
On the User-Scheduler Dialogue: Studies of User-Provided Runtime Estimates and Utility Functions
International Journal of High Performance Computing Applications
Performance prediction with skeletons
Cluster Computing
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Bounding energy consumption in large-scale MPI programs
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Accurate memory signatures and synthetic address traces for HPC applications
Proceedings of the 22nd annual international conference on Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Causal analysis for performance modeling of computer programs
Scientific Programming
A simulator for adaptive parallel applications
Journal of Computer and System Sciences
Performance prediction of large-scale parallell system and application using macro-level simulation
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Measuring the Performance and Reliability of Production Computational Grids
GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
MPTD: A Scalable and Flexible Performance Prediction Framework for Parallel Systems
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
FACT: fast communication trace collection for parallel applications through program slicing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Performance feature identification by comparative trace analysis
Future Generation Computer Systems
Performance modeling for dynamic algorithm selection
ICCS'03 Proceedings of the 2003 international conference on Computational science
A compiler approach to performance prediction using empirical-based modeling
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
A performance prediction framework for scientific applications
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Identification of performance characteristics from multi-view trace analysis
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Construction and evaluation of coordinated performance skeletons
HiPC'08 Proceedings of the 15th international conference on High performance computing
A Simulation Framework for Rapid Analysis of Reconfigurable Computing Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A framework to develop symbolic performance models of parallel applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A simulator for parallel applications with dynamically varying compute node allocation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ScalaExtrap: trace-based communication extrapolation for spmd programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Performance engineering: a must for petascale and beyond
Proceedings of the third international workshop on Large-scale system and application performance
Vrisha: using scaling properties of parallel programs for bug detection and localization
Proceedings of the 20th international symposium on High performance distributed computing
Bridging performance analysis tools and analytic performance modeling for HPC
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A framework for an automatic hybrid MPI+OpenMP code generation
Proceedings of the 19th High Performance Computing Symposia
GROPHECY: GPU performance projection from CPU code skeletons
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
PMPS(3): a performance model of parallel systems
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Hierarchical model validation of symbolic performance models of scientific kernels
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
PAM-SoC: a toolchain for predicting MPSoC performance
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Performance modeling: understanding the past and predicting the future
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A tool to display array access patterns in OpenMP programs
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ScalaExtrap: Trace-based communication extrapolation for SPMD programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
PAS2P tool, parallel application signature for performance prediction
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
Dataflow-driven GPU performance projection for multi-kernel transformations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Extending the BT NAS parallel benchmark to exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Detecting application load imbalance on high end massively parallel systems
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
An exploration of performance attributes for symbolic modeling of emerging processing devices
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Simsys: a performance simulation framework
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Validation and uncertainty assessment of extreme-scale HPC simulation through bayesian inference
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for performance modeling and prediction that is faster than cycle-accurate simulation, more informative than simple benchmarking, and is shown useful for performance investigations in several dimensions.