Lightweight monitoring of the progress of remotely executing computations

  • Authors:
  • Shuo Yang;Ali R. Butt;Y. Charlie Hu;Samuel P. Midkiff

  • Affiliations:
  • School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN

  • Venue:
  • LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increased popularity of grid systems and cycle sharing across organizations requires scalable systems that provide facilities to locate resources, to be fair in the use of those resources, and to monitor jobs executing on remote systems. This paper presents a novel and lightweight approach to monitoring the progress and correctness of a parallel computation on a remote, and potentially fraudulent, host system. We describe a monitoring system that uses a sequence of program counter values to monitor program progress, and compiler techniques that automatically generate the monitoring code. This approach improves on earlier work by omitting the need to duplicate computation, which both simplifies and reduces the overhead of monitoring. Our approach allows dynamic and accountable cycle-sharing across the Internet. Experimental results show that the overhead of our system is negligible and our monitoring approach is scalable.