Formal methods applied to high-performance computing software design: a case study of MPI one-sided communication-based locking

  • Authors:
  • Salman Pervez;Ganesh Gopalakrishnan;Robert M. Kirby;Rajeev Thakur;William Gropp

  • Affiliations:
  • Department of Computer Science, Purdue University, West Lafayette, IN, U.S.A.;School of Computing, University of Utah, Salt Lake City, UT 84112, U.S.A.;School of Computing, University of Utah, Salt Lake City, UT 84112, U.S.A.;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.;Department of Computer Science, University of Illinois, Urbana, IL 61801, U.S.A.

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

There is a growing need to address the complexity of verifying the numerous concurrent protocols employed in the high-performance computing software. Today's approaches for verification consist of testing detailed implementations of these protocols. Unfortunately, this approach can seldom show the absence of bugs, and often results in serious bugs escaping into the deployed software. An approach called Model Checking has been demonstrated to be eminently helpful in debugging these protocols early in the software life cycle by offering the ability to represent and exhaustively analyze simplified formal protocol models. The effectiveness of model checking has yet to be adequately demonstrated in high-performance computing. This paper presents a case study of a concurrent protocol that was thought to be sufficiently well tested, but proved to contain two very non-obvious deadlocks in them. These bugs were automatically detected through model checking. The protocol models in which these bugs were detected were also easy to create. Recent work in our group demonstrates that even this tedium of model creation can be eliminated by employing dynamic source-code-level analysis methods. Our case study comes from the important domain of Message Passing Interface (MPI)-based programming, which is universally employed for simulating and predicting anything from the structural integrity of combustion chambers to the path of hurricanes. We argue that model checking must be taught as well as used widely within HPC, given this and similar success stories. Copyright © 2009 John Wiley & Sons, Ltd. A shorter version of this paper was published in [1].