Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
Machine Learning
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
hypre: A Library of High Performance Preconditioners
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
A performance prediction framework for scientific applications
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Designing a highly-scalable operating system: the Blue Gene/L story
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
The blue gene/L supercomputer: a hardware and software story
International Journal of Parallel Programming
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Efficient system design space exploration using machine learning techniques
Proceedings of the 45th annual Design Automation Conference
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Optimizing MPI Runtime Parameter Settings by Using Machine Learning
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
Rapid early-stage microarchitecture design using predictive models
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A workload-aware mapping approach for data-parallel programs
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Compiler-Directed performance model construction for parallel programs
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
ACM Transactions on Embedded Computing Systems (TECS)
Estimating parallel performance
Journal of Parallel and Distributed Computing
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Gunther: search-based auto-tuning of mapreduce
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Accurately modeling and predicting performance for large-scale applications becomes increasingly difficult as system complexity scales dramatically. Analytic predictive models are useful, but are difficult to construct, usually limited in scope, and often fail to capture subtle interactions between architecture and software. In contrast, we employ multilayer neural networks trained on input data from executions on the target platform. This approach is useful for predicting many aspects of performance, and it captures full system complexity. Our models are developed automatically from the training input set, avoiding the difficult and potentially error-prone process required to develop analytic models. This study focuses on the high-performance, parallel application SMG2000, a much studied code whose variations in execution times are still not well understood. Our model predicts performance on two large-scale parallel platforms within 5%-7% error across a large, multi-dimensional parameter space.