Adaptive Control
Feedback Control of Computing Systems
Feedback Control of Computing Systems
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Friendly virtual machines: leveraging a feedback-control model for application adaptation
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Feedback Utilization Control in Distributed Real-Time Systems with End-to-End Tasks
IEEE Transactions on Parallel and Distributed Systems
Triage: Performance differentiation for storage systems using adaptive control
ACM Transactions on Storage (TOS)
Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers
IEEE Transactions on Parallel and Distributed Systems
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Coordinating Multiple Autonomic Managers to Achieve Specified Power-Performance Tradeoffs
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
On the Use of Fuzzy Modeling in Virtualized Data Center Management
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Adaptive control of virtualized resources in utility computing environments
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Feedback-controlled resource sharing for predictable eScience
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
vManage: loosely coupled platform and virtualization management in data centers
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Self-Tuning Virtual Machines for Predictable eScience
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Predictable High-Performance Computing Using Feedback Control and Admission Control
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
This paper addresses the scheduling problem that popular data parallel programming systems such as DryadLINQ and MapReduce are facing today. Designing a cluster system in a multi-user environment is challenging because cluster schedulers must satisfy multiple, possibly conflicting, enterprise goals and policies. Particularly for these new types of data-intensive applications, it continues to be a challenge to simultaneously achieve both high throughput and predictable end-to-end performance for jobs (e.g., predictable start/end times). The conventional approach of scheduling these types of jobs is to attempt to determine a best mapping between a task and a node before the job executes, and the scheduling system ceases to be involved for a given job once the job starts executing. Instead, as described in this paper, we define a reactive containment and control mechanism for scheduling and executing distributed tasks, schedule the jobs, and then continually monitor and adjust resources as the job executes. More specifically, a DryadLINQ task in our system is contained in virtual machine and distributed controllers regulate progress of the task at runtime. Using online, feedback-controlled VM CPU scheduling, our system provides a job a capability to speed-up or slow-down progress of concurrent sub-tasks so that the job can make predictable progress while sharing system resources with other jobs. The new capability allows an enterprise to enforce flexible scheduling policies such as fair-share and/or prioritizing jobs. Our evaluation results using five well-known DryadLINQ applications show the implemented distributed controllers achieve high throughput as well as predictable end-to-end performance.