A Fault-Tolerant Distributed Subcube Management Scheme for Hypercube Multicomputer Systems

Authors:
Yi-long Chen;Jyh-Charn Liu
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 8
Cited 1

Processor allocation in an N-cube multiprocessor using gray codes

IEEE Transactions on Computers
Distributed subcube identification algorithms for reliable hypercubes

Information Processing Letters
Subcube Allocation in Hypercube Computers

IEEE Transactions on Computers
The CM-5 Connection Machine: a scalable supercomputer

Communications of the ACM
Dynamic processor allocation in hypercube computers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Switching and Finite Automata Theory: Computer Science Series

Switching and Finite Automata Theory: Computer Science Series
A Top-Down Processor Allocation Scheme for Hypercube Computers

IEEE Transactions on Parallel and Distributed Systems
Job Scheduling is More Important than Processor Allocation for Hypercube Computers

IEEE Transactions on Parallel and Distributed Systems

A Hybrid Interconnection Network for Integrated Communication Services

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a fault-tolerant distributed subcube management scheme for hypercube multicomputer systems. Gracefully degradable subcube management is supported by a data structure, called the distributed subcube table (DST), and a fault-tolerant broadcast protocol, called the reliably synchronized broadcast (RSB). In an n-dimensional hypercube, DST is the collection of 2nlocal subcube tables (LSTs), ${\mbi DST = \{LST_0,\,LT_1,\,\dots,\,LST^n_{2-1}\}}$, where LSTx is a bit-mapped table assigned to Nx, a fault-free node whose address is x. LSTx, 驴x, is n+ 1 bits long, and it records the status (free/busy) of certain subcubes adjacent to Nx. The RSB diagnoses and avoids faults during interprocessor communication to prevent faulty nodes from being allocated for job execution. In addition to possessing a fault-tolerant design, our scheme can also achieve comparable or better performance than existing centralized schemes, as verified by extensive simulation.