Enabling fair pricing on HPC systems with node sharing

Authors:
Alex D. Breslow;Ananta Tiwari;Martin Schulz;Laura Carrington;Lingjia Tang;Jason Mars
Affiliations:
University of California, San Diego, CA;San Diego Supercomputer Center, La Jolla, CA;Lawrence Livermore National Laboratory, Livermore, CA;San Diego Supercomputer Center, La Jolla, CA;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 39
Cited 0

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler support for software-based cache partitioning

LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Paired Gang Scheduling

IEEE Transactions on Parallel and Distributed Systems
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Predictable performance in SMT processors

Proceedings of the 1st conference on Computing frontiers
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing

IEEE Transactions on Computers
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A Dynamic MapReduce Scheduler for Heterogeneous Workloads

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
A view of cloud computing

Communications of the ACM
Contention aware execution: online contention detection and response

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Accelerating data-intensive science with Gordon and Dash

Proceedings of the 2010 TeraGrid Conference
Distributed systems meet economics: pricing in the cloud

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
DASH: a Recipe for a Flash-based Data Intensive Supercomputer

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Directly characterizing cross core interference through contention synthesis

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Trestles: a high-productivity HPC system targeted to modest-scale and gateway users

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Reducing energy usage with memory and computation-aware dynamic frequency scaling

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Cache Pirating: Measuring the Curse of the Shared Cache

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
DejaVu: accelerating resource allocation in virtualized environments

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Pricing cloud bandwidth reservations under demand uncertainty

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
D-factor: a quantitative model of application slow-down in multi-resource shared systems

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Pricing Cloud Compute Commodities: A Novel Financial Economic Model

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Bandwidth bandit: Understanding memory contention

ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10 to 20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism -- a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners -- to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.