Trust but verify: monitoring remotely executing programs for progress and correctness

Authors:
Shuo Yang;Ali R. Butt;Y. Charlie Hu;Samuel P. Midkiff
Affiliations:
Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN
Venue:
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2005

Citing 22
Cited 8

Experience with topaz telebugging

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
The high performance Fortran handbook

The high performance Fortran handbook
Analytical performance prediction on multicomputers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Precise compile-time performance prediction for superscalar-based computers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SUIF Explorer: an interactive and interprocedural parallelizer

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A parallel java grande benchmark suite

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Improving program slicing with dynamic points-to data

Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
A portable debugger for parallel and distributed programs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Sabotage-Tolerance Mechanisms for Volunteer Computing Systems

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Program slicing

ICSE '81 Proceedings of the 5th international conference on Software engineering
Samsara: honor among thieves in peer-to-peer storage

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Grid-computing portals and security issues

Journal of Parallel and Distributed Computing - Scalable web services and architecture
Uncheatable Grid Computing

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Collapsar: a VM-based architecture for network attack detention center

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Java, peer-to-peer, and accountability: building blocks for distributed cycle sharing

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
A code isolator: isolating code fragments from large programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

CycleMeter: detecting fraudulent peers in internet cycle sharing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Portable virtual cycle accounting for large-scale distributed cycle sharing systems

Parallel Computing
Group-based adaptive result certification mechanism in Desktop Grids

Future Generation Computer Systems
Monitoring remotely executing shared memory programs in software DSMs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Accountable virtual machines

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
GRace: a low-overhead mechanism for detecting data races in GPU programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Lightweight monitoring of the progress of remotely executing computations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
EVE: verifying correct execution of cloud-hostedweb applications

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increased popularity of grid systems and cycle sharing across organizations requires scalable systems that provide facilities to locate resources, to be fair in the use of those resources, and to monitor jobs executing on remote systems. This paper describes the GridCop system which allows a computation on a remote, and potentially fraudulent, host system to be monitored for progress and execution correctness. A novel feature of our system is that it constructs cooperating submitter and host programs from the original program, and these programs allow both progress and execution correctness to be monitored with negligible overhead while providing protection against common fraudulent behaviors. Experimental results show that the overhead of this monitoring is low on both the submitting and host machines. We describe compiler algorithms that allow the required monitoring code to be automatically generated.