Efficient on-demand operations in dynamic distributed infrastructures

Authors:
Steven Y. Ko;Indranil Gupta
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Year:
2008

Citing 21
Cited 0

Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
On Scheduling Mesh-Structured Computations for Internet-Based Computing

IEEE Transactions on Computers
Design and Analysis of a Dynamic Scheduling Strategy with Resource Estimation for Large-Scale Grid Systems

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing

IEEE Transactions on Computers
Delay aware querying with seaweed

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sharing aggregate computation for distributed queries

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Detecting stepping stones

SSYM'00 Proceedings of the 9th conference on USENIX Security Symposium - Volume 9
MON: on-demand overlays for distributed system management

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Towards highly reliable enterprise network services via inference of multi-level dependencies

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
STAR: self-tuning aggregation for scalable monitoring

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A new class of nature-inspired algorithms for self-adaptive peer-to-peer computing

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Moara: Flexible and Scalable Group-Based Querying System

Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
New worker-centric scheduling strategies for data-intensive grid applications

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Ricochet: lateral error correction for time-critical multicast

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a large-scale distributed infrastructure, users and administrators typically desire to perform on-demand operations that act upon the most up-to-date state of the infrastructure. These on-demand operations range from monitoring the up-to-date machine properties in the infrastructure, to making Grid scheduling decisions for different tasks based on the current status of the infrastructure. However, the scale and dynamism present in the operating environment make it challenging to support these operations efficiently. This paper discusses several on-demand operations that we have been studying, challenges associated with them, and how to meet the challenges. Specifically, we build techniques for 1) on-demand group monitoring that allows users and administrators of an infrastructure to query and aggregate the up-to-date state of the machines (e.g., CPU utilization) in a group or multiple groups, 2) an on-demand Grid scheduling strategy that makes scheduling decisions based on the current availability of compute nodes, 3) another on-demand Grid scheduling strategy that chooses the best algorithm for the current input data set among multiple algorithms available. We also present our ongoing work.