Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Semi-empirical multiprocessor performance predictions
Journal of Parallel and Distributed Computing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Modeling the Communication Performance of the IBM SP2
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Accurate Performance Prediction for Assively Parallel Systems and Its Applications
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Conventional Benchmarks as a Sample of the Performance Spectrum
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Integrated Compilation and Scalability Analysis for Parallel Systems
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance Profiling and Analysis of DoD Applications Using PAPI and TAU
DOD_UGC '05 Proceedings of the 2005 Users Group Conference on 2005 Users Group Conference
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
Detection and Parallel Execution of Independent Instructions
IEEE Transactions on Computers
Identification of performance characteristics from multi-view trace analysis
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Accurate memory signatures and synthetic address traces for HPC applications
Proceedings of the 22nd annual international conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Performance analysis of HPC applications on low-power embedded platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Benchmarks that measure memory bandwidth, such as STREAM, Apex-MAPS and MultiMAPS, are increasingly popular due to the "Von Neumann" bottleneck of modern processors which causes many calculations to be memory-bound. We present a scheme for predicting the performance of HPC applications based on the results of such benchmarks. A Genetic Algorithm approach is used to "learn" bandwidth as a function of cache hit rates per machine with MultiMAPS as the fitness test. The specific results are 56 individual performance predictions including 3 full-scale parallel applications run on 5 different modern HPC architectures, with various CPU counts and inputs, predicted within 10% average difference with respect to independently verified runtimes.