A Concurrent Test Architecture for Massively Parallel Computers and Its Error Detection Capability

  • Authors:
  • Marius V. A. Hancu;Kazuhiko Iwasaki;Yuji Sato;Mamoru Sugie

  • Affiliations:
  • -;-;-;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Presents new principles for online monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, concurrent testing (or online monitoring) at the system level is accomplished by enforcing the run-time testing of the data and control dependences of the algorithm currently being executed on the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination, the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PEs is also possible. As a result of this procedure, an image of the data dependences and of the control flow of the currently running algorithm is created. This image is compared, at the end of each computational block, with a reference image created at compilation time. The main results of this work are in proposing new principles for the online system-level testing of multiprocessor systems, based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for monitoring the data routing process.