Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
On-the-fly detection of access anomalies
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
FFTs in external or hierarchical memory
The Journal of Supercomputing
An empirical comparison of monitoring algorithms for access anomaly detection
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Techniques for debugging parallel programs with flowback analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting data races on weak memory systems
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
On-the-fly detection of data races for programs with nested fork-join parallelism
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Distributed operating systems
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Parallel programming: techniques and applications using networked workstations and parallel computers
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Efficient and precise datarace detection for multithreaded object-oriented programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Static conflict analysis for multi-threaded object-oriented programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
RacerX: effective, static detection of race conditions and deadlocks
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Race checking by context inference
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimized run-time race detection and atomicity checking using partial discovered types
Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors
The Journal of Supercomputing
Teamster-G: a grid-enabled software DSM system
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Detecting data races using dynamic escape analysis based on read barrier
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
TRaDe, a topological approach to on-the-fly race detection in java programs
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Editorial: Special section: Grid computing and the message passing interface
Future Generation Computer Systems
netWorker - Cloud computing: PC functions move onto the web
Software transactional distributed shared memory
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Future Generation Computer Systems
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Hi-index | 0.00 |
Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications.