Enabling tracing Of long-running multithreaded programs via dynamic execution reduction

Authors:
Sriraman Tallam;Chen Tian;Rajiv Gupta;Xiangyu Zhang
Affiliations:
University of Arizona;University of Arizona;University of Arizona;Purdue University
Venue:
Proceedings of the 2007 international symposium on Software testing and analysis
Year:
2007

Citing 20
Cited 8

Dynamic program slicing

Information Processing Letters
Debugging distributed C programs by real time reply

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Supporting reverse execution for parallel programs

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Optimal tracing and incremental reexecution for debugging long-running programs

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An efficient relevant slicing method for debugging

ESEC/FSE-7 Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on Foundations of software engineering
Pointer and escape analysis for multithreaded programs

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Record/replay for nondeterministic program executions

Communications of the ACM - Why CS students need math
Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Proceedings of the 32nd annual international symposium on Computer Architecture
Jockey: a user-space library for record-replay debugging

Proceedings of the sixth international symposium on Automated analysis-driven debugging
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Locating faulty code using failure-inducing chops

Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Dynamic slicing long running programs through execution fast forwarding

Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Framework for instruction-level tracing and analysis of program executions

Proceedings of the 2nd international conference on Virtual execution environments
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Enhancing server availability and security through failure-oblivious computing

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6

A trace simplification technique for effective debugging of concurrent programs

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Toward generating reducible replay logs

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An efficient static trace simplification technique for debugging concurrent programs

SAS'11 Proceedings of the 18th international conference on Static analysis
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A system for debugging via online tracing and dynamic slicing

Software—Practice & Experience
LEAN: simplifying concurrency bug reproduction via replay-supported execution reduction

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Scaling predictive analysis of concurrent programs by removing trace redundancy

ACM Transactions on Software Engineering and Methodology (TOSEM)
DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Debugging long running multithreaded programs is a very challenging problem when using tracing-based analyses. Since such programs are non-deterministic, reproducing the bug is non-trivial and generating and inspecting traces for long running programs can be prohibitively expensive. We propose a framework in which, to overcome the problem of bug reproducibility, a lightweight logging technique is used to log the events during the original execution. When a bug is encountered, it is reproduced using the generated log and during the replay, a fine-grained tracing technique is employed to collect control-flow/dependence traces that are then used to locate the root cause of the bug. In this paper, we address the key challenges resulting due to tracing, that is, the prohibitively high expense of collecting traces and the significant burden on the user who must examine the large amount of trace information to locate the bug in a long-running multithreaded program. These challenges are addressed through execution reduction that realizes a combination of logging and tracing such that traces collected contain only the execution information from those regions of threads that are relevant to the fault. This approach is highly effective because we observe that for long running multithreaded programs, many threads that execute are irrelevant to the fault. Hence, these threads need not be replayed and traced when trying to reproduce the bug. We develop a novel lightweight scheme that identifies such threads by observing all the interthread data dependences and removes their execution footprint in the replay run. In addition, we identify regions of thread executions that need not be replayed or, if they must be replayed, we determine if they need not be traced. Following execution reduction, the replayed execution takes lesser time to run and it produces a much smaller trace than the original execution. Thus, the cost of collecting traces and the effort of examining the traces to locate the fault are greatly reduced.