Adapt or become extinct!: the case for a unified framework for deployment-time optimization (position paper)

Authors:
Georgios Goumas;Sally A. McKee;Magnus Själander;Thomas R. Gross;Sven Karlsson;Christian W. Probst;Lixin Zhang
Affiliations:
National Technical University of Athens, Athens, Greece;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;ETH Zürich, Zürich, Switzerland;Technical University of Denmark, Kongens Lyngby, Denmark;Technical University of Denmark, Kongens Lyngby, Denmark;National Research Center of High Performance Computers, Beijing, China
Venue:
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Year:
2011

Citing 20
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Fast Automatic Generation of DSP Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Methods of inference and learning for performance modeling of parallel applications

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems

Proceedings of the 22nd annual international conference on Supercomputing
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Communication-Aware Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Core monitors: monitoring performance in multicore processors

Proceedings of the 6th ACM conference on Computing frontiers
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing
Variant-based competitive parallel execution of sequential programs

Proceedings of the 7th ACM international conference on Computing frontiers
An approach to resource-aware co-scheduling for CMPs

Proceedings of the 24th ACM International Conference on Supercomputing
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Comparing scalability prediction strategies on an SMP of CMPs

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The High-Performance Computing ecosystem consists of a large variety of execution platforms that demonstrate a wide diversity in hardware characteristics such as CPU architecture, memory organization, interconnection network, accelerators, etc. This environment also presents a number of hard boundaries (walls) for applications which limit software development (parallel programming wall), performance (memory wall, communication wall) and viability (power wall). The only way to survive in such a demanding environment is by adaptation. In this paper we discuss how dynamic information collected during the execution of an application can be utilized to adapt the execution context and may lead to performance gains beyond those provided by static information and compile-time adaptation. We consider specialization based on dynamic information like user input, architectural characteristics such as the memory hierarchy organization, and the execution profile of the application as obtained from the execution platform's performance monitoring units. One of the challenges of future execution platforms is to allow the seamless integration of these various kinds of information with information obtained from static analysis (either during ahead-of-time or just-in-time) compilation. We extend the notion of information-driven adaptation and outline the architecture of an infrastructure designed to enable information flow and adaptation through-out the life-cycle of an application.