Towards robust distributed systems (abstract)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
ACM Transactions on Computer Systems (TOCS)
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
On Scheduling Mesh-Structured Computations for Internet-Based Computing
IEEE Transactions on Computers
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing
IEEE Transactions on Computers
Delay aware querying with seaweed
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sharing aggregate computation for distributed queries
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SSYM'00 Proceedings of the 9th conference on USENIX Security Symposium - Volume 9
MON: on-demand overlays for distributed system management
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
STAR: self-tuning aggregation for scalable monitoring
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A new class of nature-inspired algorithms for self-adaptive peer-to-peer computing
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Moara: Flexible and Scalable Group-Based Querying System
Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
New worker-centric scheduling strategies for data-intensive grid applications
Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Ricochet: lateral error correction for time-critical multicast
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Hi-index | 0.00 |
In a large-scale distributed infrastructure, users and administrators typically desire to perform on-demand operations that act upon the most up-to-date state of the infrastructure. These on-demand operations range from monitoring the up-to-date machine properties in the infrastructure, to making Grid scheduling decisions for different tasks based on the current status of the infrastructure. However, the scale and dynamism present in the operating environment make it challenging to support these operations efficiently. This paper discusses several on-demand operations that we have been studying, challenges associated with them, and how to meet the challenges. Specifically, we build techniques for 1) on-demand group monitoring that allows users and administrators of an infrastructure to query and aggregate the up-to-date state of the machines (e.g., CPU utilization) in a group or multiple groups, 2) an on-demand Grid scheduling strategy that makes scheduling decisions based on the current availability of compute nodes, 3) another on-demand Grid scheduling strategy that chooses the best algorithm for the current input data set among multiple algorithms available. We also present our ongoing work.