Probabilistic resource allocation in heterogeneous distributed systems with random failures

Authors:
Vladimir Shestak;Edwin K. P. Chong;Anthony A. Maciejewski;Howard Jay Siegel
Affiliations:
Ricoh InfoPrint Solutions, 6300 Diagonal Highway, Boulder, CO 80301, United States;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States and Department of Mathematics, Colorado State University, Fort Collins, CO ...;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States and Department of Computer Science, Colorado State University, Fort Collins ...
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 18
Cited 0

A sequential stochastic assignment problem in a partially observable Markov chain

Mathematics of Operations Research
Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach

Journal of Parallel and Distributed Computing - Special issue on parallel evolutionary computing
On the availability of a distributed computer system with failing components

SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors

Journal of the ACM (JACM)
Predicting Client/Server Availability

Computer
Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
A Methodology for Detection and Estimation of Software Aging

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
An Approach for Estimation of Software Aging in a Web Server

ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Energy efficiency and fairness tradeoffs in multi-resource, multi-tasking embedded systems

Proceedings of the 2003 international symposium on Low power electronics and design
Periodic Resource Model for Compositional Real-Time Guarantees

RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
The Effect of Different Failure Recovery Procedures on the Distribution of Task Completion Times

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Reliable Distributed Systems: Technologies, Web Services, and Applications

Reliable Distributed Systems: Technologies, Web Services, and Applications
A resource manager for optimal resource selection and fault tolerance service in Grids

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment

Journal of Parallel and Distributed Computing
Stochastic robustness metric and its use for static resource allocations

Journal of Parallel and Distributed Computing
Real-time task mapping and scheduling for collaborative in-network processing in DVS-enabled wireless sensor networks

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Load balancing in the presence of random node failure and recovery

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of finding efficient workload distribution techniques is becoming increasingly important today for heterogeneous distributed systems where the availability of compute nodes may change spontaneously over time. Resource-allocation policies designed for such systems should maximize the performance and, at the same time, be robust against failure and recovery of compute nodes. Such a policy, based on the concepts of the Derman-Lieberman-Ross theorem, is proposed in this work, and is applied to a simulated model of a dedicated system composed of a set of heterogeneous image processing servers. Assuming that each image results in a ''reward'' if its processing is completed before a certain deadline, the goal for the resource allocation policy is to maximize the expected cumulative reward. An extensive analysis was done to study the performance of the proposed policy and compare it with the performance of some existing policies adapted to this environment. Our experiments conducted for various types of task-machine heterogeneity illustrate the potential of our method for solving resource allocation problems in a broad spectrum of distributed systems that experience high failure rates.