Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Authors:
Jeffrey J. Evans;Charles E. Lucas
Affiliations:
Purdue University, West Lafayette, USA 47907;PC Krause and Associates, Inc., West Lafayette, USA 47906
Venue:
Cluster Computing
Year:
2013

Citing 37
Cited 0

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
The turn model for adaptive routing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Synthetic-perturbation tuning of MIMD programs

The Journal of Supercomputing
A Framework-Based Approach to the Development of Network-Aware Applications

IEEE Transactions on Software Engineering
Performance monitoring in a Myrinet-connected SHRIMP cluster

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
The Paradyn Parallel Performance Measurement Tool

Computer
Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A First Implementation of In-Transit Buffers on Myrinet GM Software

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Core Algorithms of the Maui Scheduler

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
A Runtime Monitoring Framework for the TAU Profiling System

ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Prototype of AM3: Active Mapper and Monitoring Module for Myrinet Environments

LCN '02 Proceedings of the 27th Annual IEEE Conference on Local Computer Networks
Active harmony: towards automated performance tuning

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Asserting performance expectations

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Selective Buddy Allocation for Scheduling Parallel Jobs on Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Supermon: A High-Speed Cluster Monitoring System

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A Very Efficient Distributed Deadlock Detection Mechanism for Wormhole Networks

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Autopilot: Adaptive Control of Distributed Applications

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
A New Task Mapping Technique for Communication-Aware Scheduling Strategies

ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
Exposing Application Alternatives

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Analysis of Timeout-Based Adaptive Wormhole Routing

MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Exploring the Relationship Between Parallel Application Run-Time Variability and Network Performance in Clusters

LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
Using Dynamic Kernel Instrumentation for Kernel and Application Tuning

International Journal of High Performance Computing Applications
Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids

International Journal of High Performance Computing Applications
Adaptive Parallel Job Scheduling with Flexible Coscheduling

IEEE Transactions on Parallel and Distributed Systems
Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Toward a Realistic Task Scheduling Model

IEEE Transactions on Parallel and Distributed Systems
PARSE: A Tool for Parallel Application Run Time Sensitivity Evaluation

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Modeling parallel application sensitivity to network performance

Modeling parallel application sensitivity to network performance
Network performance variability in NOW clusters

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
The portable batch scheduler and the maui scheduler on linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Thermal-aware task scheduling for data centers through minimizing heat recirculation

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers

Computer Networks: The International Journal of Computer and Telecommunications Networking
A network performance sensitivity metric for parallel applications

International Journal of High Performance Computing and Networking
PARSE 2.0: A Tool for Parallel Application Run Time Behavior Evaluation

ICDCSW '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops
A network performance sensitivity metric for parallel applications

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.