A Fault-Tolerant Distributed Subcube Management Scheme for Hypercube Multicomputer Systems

  • Authors:
  • Yi-long Chen;Jyh-Charn Liu

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a fault-tolerant distributed subcube management scheme for hypercube multicomputer systems. Gracefully degradable subcube management is supported by a data structure, called the distributed subcube table (DST), and a fault-tolerant broadcast protocol, called the reliably synchronized broadcast (RSB). In an n-dimensional hypercube, DST is the collection of 2nlocal subcube tables (LSTs), ${\mbi DST = \{LST_0,\,LT_1,\,\dots,\,LST^n_{2-1}\}}$, where LSTx is a bit-mapped table assigned to Nx, a fault-free node whose address is x. LSTx, 驴x, is n+ 1 bits long, and it records the status (free/busy) of certain subcubes adjacent to Nx. The RSB diagnoses and avoids faults during interprocessor communication to prevent faulty nodes from being allocated for job execution. In addition to possessing a fault-tolerant design, our scheme can also achieve comparable or better performance than existing centralized schemes, as verified by extensive simulation.