Path- and index-sensitive string analysis based on monadic second-order logic

  • Authors:
  • Takaaki Tateishi;Marco Pistoia;Omer Tripp

  • Affiliations:
  • IBM Research - Tokyo, Japan;IBM T. J. Waton Research Center, Yorktown Heights, NY;IBM Software Group and Tel Aviv University, Israel

  • Venue:
  • ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel technique for statically verifying the strings generated by a program. The verification is conducted by encoding the program in Monadic Second-order Logic (M2L). We use M2L to describe constraints among program variables and to abstract built-in string operations. Once we encode a program in M2L, a theorem prover for M2L, such as MONA, can automatically check if a string generated by the program satisfies a given specification, and if not, exhibit a counterexample. With this approach, we can naturally encode relationships among strings, accounting also for cases in which a program manipulates strings using indices. In addition, our string analysis is path sensitive in that it accounts for the effects of string and Boolean comparisons, as well as regular-expression matches. We have implemented our string analysis algorithm, and used it to augment an industrial security analysis for Web applications by automatically detecting and verifying sanitizers—methods that eliminate malicious patterns from untrusted strings, making these strings safe to use in security-sensitive operations. On the 8 benchmarks we analyzed, our string analyzer discovered 128 previously unknown sanitizers, compared to 71 sanitizers detected by a previously presented string analysis.