Achieving high availability and performance computing with an HA-OSCAR cluster

  • Authors:
  • Chokchai Box Leangsuksun;Lixin Shen;Tong Liu;Stephen L. Scott

  • Affiliations:
  • Computer Science Program, Louisiana Tech University, Ruston, LA;Computer Science Program, Louisiana Tech University, Ruston, LA;Computer Science Program, Louisiana Tech University, Ruston, LA;Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN

  • Venue:
  • Future Generation Computer Systems - Special issue: High-speed networks and services for data-intensive grids: The DataTAG project
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

High availability (HA) computing has long gained much attention in enterprise and mission critical systems. HA goals are to maximize the uptime, thus undoubtedly complementing high-performance computing (HPC) objectives. HA-OSCAR is a project that aims to improve HA in commercial-off-the-shelf (COTS)-based HPC environments. In this paper, we introduce a multihead HPC cluster architecture. Server redundancy is an initial key aspect aiming toward downtime reduction. Two HA-OSCAR types, active-standby and active-active, are studied. We evaluate system dependability for given two models. Stochastic Reward Nets (SRN) are used to model the system availability. We describe our SRN modeling using Stochastic Petri Net Package, and compute several interesting results that characterize HA-OSCAR availability.