Algorithmic debugging with assertions
Meta-programming in logic programming
Relative debugging and its application to the development of large numerical models
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Relative debugging: a new methodology for debugging scientific applications
Communications of the ACM
Effective use of assertions in C++
ACM SIGPLAN Notices
Software—Practice & Experience
Relative Debugging for Data-Parallel Programs: A ZPL Case Study
IEEE Concurrency
Guard: A Tool for Migrating Scientific Applications to the .NET Framework
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
A Framework for Assertion-Based Debugging in Constraint Logic Programming
CP '98 Proceedings of the 4th International Conference on Principles and Practice of Constraint Programming
Reliable Hashing without Collosion Detection
CAV '93 Proceedings of the 5th International Conference on Computer Aided Verification
A Framework for Automatic Debugging
Proceedings of the 17th IEEE international conference on Automated software engineering
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Aladdin: Assembly Language Assertion Driven Debugging Interpreter
IEEE Transactions on Software Engineering
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable parallel debugging with statistical assertions
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A Scalable Parallel Debugging Library with Pluggable Communication Protocols
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
Debugging parallel programs is an order of magnitude more complex than sequential ones, and yet, most parallel debuggers provide little extra functionality than their sequential counterparts. This problem becomes more serious as computational codes become more complex, involving larger data structures, and as the machines become larger. Peta-scale machines consisting of millions of cores pose a significant challenge for existing techniques. We argue that debugging must become more data-centric, and believe that "assertions" provide a useful model. Assertions allow a user to declare their expectations about the program state as a whole rather than focusing on that of only a single process state. Previously, we have implemented a special type of assertion that supports debugging applications as they evolve or are ported to different platforms. They allow a user to compare the state of one program against another reference version. These 'relative debugging' assertions, whilst powerful, pose significant implementation challenges for large peta-scale machines. In this paper we discuss a hashing technique that provides a scalable solution for very large problems on very large machines. We illustrate the scheme on 65k cores of Kraken, a Cray XT5 at the University of Tennessee.