Performance variability of highly parallel architectures

Authors:
William T. C. Kramer;Clint Ryan
Affiliations:
Department of Computing Sciences, University of California at Berkeley and the National Energy Research Scientific Computing Center, Lawrence Berkeley, National Laboratory;Department of Computing Sciences, University of California at Berkeley
Venue:
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Year:
2003

Citing 2
Cited 12

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer

The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Benchmarking the effects of operating system interference on extreme-scale parallel machines

Cluster Computing
Performability modeling for scheduling and fault tolerance strategies for scientific workflows

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Tuning parallel applications in parallel

Parallel Computing
Consistent Application Performance at the Exascale

International Journal of High Performance Computing Applications
A multi-dimensional classification model for scientific workflow characteristics

Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The impact of noise on the scaling of collectives: a theoretical approach

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
There goes the neighborhood: performance degradation due to nearby jobs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Predictable quality of service atop degradable distributed systems

Cluster Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The design and evaluation of high performance computers has concentrated on increasing computational speed for applications. This performance is often measured on a well configured dedicated system to show the best case. In the real environment, resources are not always dedicated to a single task, and systems run tasks that may influence each other, so run times vary, sometimes to an unreasonably large extent. This paper explores the amount of variation seen across four large distributed memory systems in a systematic manner. It then analyzes the causes for the variations seen and discusses what can be done to decrease the variation without impacting performance.