DTAM: dynamic taint analysis of multi-threaded programs for relevancy

  • Authors:
  • Malay Ganai;Dongyoon Lee;Aarti Gupta

  • Affiliations:
  • NEC Labs America;University of Michigan;NEC Labs America

  • Venue:
  • Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Testing and debugging multi-threaded programs are notoriously difficult due to non-determinism not only in inputs but also in OS schedules. In practice, dynamic analysis and failure replay systems instrument the program to record events of interest in the test execution, e.g., program inputs, accesses to shared objects, synchronization operations, context switches, etc. To reduce the overhead of logging during runtime, these testing and debugging efforts have proposed tradeoffs for sampling or selective logging, at the cost of reducing coverage or performing more expensive search offline. We propose to identify a subset of input sources and shared objects that are, in a sense, relevant for covering program behavior. We classify various types of relevancy in terms of how an input source or a shared object can affect control flow (e.g., a conditional branch) or dataflow (e.g., state of the shared objects) in the program. Such relevancy data can be used by testing and debugging methods to reduce their recording overhead and to guide coverage. To conduct relevancy analysis, we propose a novel framework based on dynamic taint analysis for multi-threaded programs, called DTAM. It performs thread-modular taint analysis for each thread in parallel during runtime, and then aggregates the thread-modular results offline. This approach has many advantages: (a) it is faster than conducting taint analysis for serialized multi-threaded executions, (b) it can compute results for alternate thread interleavings by generalizing the observed execution, and (c) it provides a knob to tradeoff precision with coverage, depending on how thread-modular results are aggregated to account for alternate interleavings. We have implemented DTAM and performed an experimental evaluation on publicly available benchmarks for relevancy analysis. Our experiments show that most shared accesses and conditional branches are dependent on some program input sources. Interestingly in our test runs, on average, only about 25% input sources and 3% shared objects affect other shared accesses through conditional branches. Thus, it is important to identify such relevant input sources and shared objects for testing and debugging.