Computation-at-risk: employing the grid for computational risk management

  • Authors:
  • S. D. Kleban;S. H. Clearwater

  • Affiliations:
  • Sandia Nat. Labs., Albuquerque, NM, USA;San Diego Supercomput. Center, CA, USA

  • Venue:
  • CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

This work expands upon our earlier work involving the concept of computation-at-risk (CaR). In particular, CaR refers to the risk that certain computations may not get done within a timely manner. We examine a number of CaR distributions on several large clusters. The important contribution of This work is that it shows that there exist CaR-reducing strategies and by employing such strategies, a facility can significantly reduce the risk of inefficient resource utilization. Grids are shown to be one means for employing a CaR-reducing strategy. For example, we show that a CaR-reducing strategy applied to a common queue can have a dramatic effect on the wait times for jobs on a grid of clusters. In particular, we defined a CaR Sharpe rule that provides a decision rule for determining the best machine in a grid to place a new job.