Feedback-controlled resource sharing for predictable eScience

Authors:
Sang-Min Park;Marty Humphrey
Affiliations:
University of Virginia, Charlottesville, VA;University of Virginia, Charlottesville, VA
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 22
Cited 8

A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Digital Control of Dynamic Systems

Digital Control of Dynamic Systems
Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences

Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms*

Real-Time Systems
Compact application signatures for parallel and distributed scientific codes

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Multivariate resource performance forecasting in the network weather service

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scheduling with Advanced Reservations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A Case For Grid Computing On Virtual Machines

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Feedback Control of Computing Systems

Feedback Control of Computing Systems
Diagnosing performance overheads in the xen virtual machine environment

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Making the Grid Predictable through Reservations and Performance Modelling

The Computer Journal
Triage: Performance differentiation for storage systems using adaptive control

ACM Transactions on Storage (TOS)
Predicting bounds on queuing delay for batch-scheduled parallel machines

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Virtual Clusters for Grid Communities

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers

IEEE Transactions on Parallel and Distributed Systems
Sharing networked resources with brokered leases

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Combining batch execution and leasing using virtual machines

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Adaptive pricing for resource reservations in Shared environments

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Paravirtualization for HPC systems

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Division of labor: tools for growing and scaling grids

ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Towards dynamically adaptive weather analysis and forecasting in LEAD

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II

Responsive elastic computing

GMAC '09 Proceedings of the 6th international conference industry session on Grids meets autonomic computing
Self-Tuning Virtual Machines for Predictable eScience

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Predictable time-sharing for DryadLINQ cluster

Proceedings of the 7th international conference on Autonomic computing
Resource provisioning with budget constraints for adaptive applications in cloud environments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Proceedings of the 20th international symposium on High performance distributed computing
Auto-scaling to minimize cost and meet application deadlines in cloud workflows

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Fuzzy Allocation of Fine-Grained Compute Resources for Grid Data Streaming Applications

International Journal of Grid and High Performance Computing
Fuzzy adaptive control for heterogeneous tasks in high-performance storage systems

Proceedings of the 6th International Systems and Storage Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emerging class of adaptive, real-time, data-driven applications are a significant problem for today's HPC systems. In general, it is extremely difficult for queuing-system-controlled HPC resources to make and guarantee a tightly-bounded prediction regarding the time at which a newly-submitted application will execute. While a reservation-based approach partially addresses the problem, it can create severe resource under-utilization (unused reservations, necessary scheduled idle slots, underutilized reservations, etc.) that resource providers are eager to avoid. In contrast, this paper presents a fundamentally different approach to guarantee predictable execution. By creating a virtualized application layer called the performance container, and opportunistically multiplexing concurrent performance containers through the application of formal feedback control theory, we regulate the job's progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected disturbances. Our evaluation using two widely-used applications, WRF and BLAST, on an 8-core server show our approach is predictable and meets deadlines with 3.4 % of errors on average while achieving high overall utilization.