Non-Intrusive Detection of Synchronization Errors Using Execution Replay

Authors:
Michiel Ronsse;Koen De Bosschere
Affiliations:
Department of Electronics and Information Systems, Ghent University, Belgium. ronsse@elis.rug.ac.be;Department of Electronics and Information Systems, Ghent University, Belgium. kdb@elis.rug.ac.be
Venue:
Automated Software Engineering
Year:
2002

Citing 37
Cited 1

A probe effect in concurrent programs

Software—Practice & Experience
Correctness proofs of distributed termination algorithms

ACM Transactions on Programming Languages and Systems (TOPLAS) - The MIT Press scientific computation series
Discarding Obsolete Information in a Replicated Database System

IEEE Transactions on Software Engineering - Special issue on distributed systems
Deadlock detection in distributed databases

ACM Computing Surveys (CSUR)
On-the-fly detection of access anomalies

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Improving the accuracy of data race detection

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Race Frontier: reproducing data races in parallel-program debugging

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Logical Time in Distributed Computing Systems

Computer - Distributed computing systems: separate resources acting as one
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On-the-fly detection of data races for programs with nested fork-join parallelism

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Optimal tracing and replay for debugging shared-memory parallel programs

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Logical Time: Capturing Causality in Distributed Systems

Computer
Online data-race detection via coherency guarantees

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Operating systems (2nd ed.): design and implementation

Operating systems (2nd ed.): design and implementation
Distributed deadlock detection in Ada run-time environments

TRI-Ada '90 Proceedings of the conference on TRI-ADA '90
Performance debugging shared memory parallel programs using run-time dependence analysis

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
Redundant Synchronization Elimination for DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
RecPlay: a fully integrated practical record/replay system

ACM Transactions on Computer Systems (TOCS)
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Escape analysis for object-oriented languages: application to Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compositional pointer and escape analysis for Java programs

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
System Deadlocks

ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
JiTI: a robust just in time instrumentation technique

ACM SIGARCH Computer Architecture News
Detecting Race Conditions in Parallel Programs that Use One Semaphore

WADS '93 Proceedings of the Third Workshop on Algorithms and Data Structures
On the Implementation of a Reply Mechanism

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Data Race Detection Based on Replay for Parallel Applications

CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Automatic synchronisation elimination in synchronous FORALLs

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Efficient solutions to the replicated log and dictionary problems

PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
Optimally Synchronizing DOACROSS Loops on Shared Memory Multiprocessors

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Space efficient data race detection for parallel programs with series-parallel task graphs

PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
Optimal deadlock detection in distributed systems based on locally constructed wait-for graphs

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Clock Snooping and its Application in on-the-fly Data Race Detection

ISPAN '97 Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks
TRaDe, a topological approach to on-the-fly race detection in java programs

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1

Agent-based error prevention algorithms

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a practical solution for detecting synchronization errors in parallel programs. These errors are: a lack of synchronization resulting in data races, conflicting synchronization resulting in deadlock and redundant synchronization resulting in a performance penalty.The solution consists of a combination of RecPlay, an efficient execution replay mechanism combined with automatic on-the-fly data race detection, deadlock detection and the detection of redundant synchronization during a replayed execution. The detection of data races, deadlocks and redundant synchronization normally introduces an important overhead during an execution, possibly altering the execution. However, by performing these extensive operations during a replayed and therefore unaltered execution there is almost no probe effect. Furthermore, the memory consumption during the data race detection is limited through the use of multilevel bitmaps and snooped matrix clocks. As the record phase of RecPlay is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time.