Merlin: specification inference for explicit information flow problems

Authors:
Benjamin Livshits;Aditya V. Nori;Sriram K. Rajamani;Anindya Banerjee
Affiliations:
Microsoft Research, Redmond, WA, USA;Microsoft Research, Bangalore, India;Microsoft Research, Bangalore, India;IMDEA Software, Madrid, Spain
Venue:
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Year:
2009

Citing 27
Cited 24

An axiomatic basis for computer programming

Communications of the ACM
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Automatic extraction of object-oriented component interfaces

ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Understanding belief propagation and its generalizations

Exploring artificial intelligence in the new millennium
Securing web application code by static analysis and runtime protection

Proceedings of the 13th international conference on World Wide Web
Abstraction, Refinement And Proof For Probabilistic Systems (Monographs in Computer Science)

Abstraction, Refinement And Proof For Probabilistic Systems (Monographs in Computer Science)
Abstraction and refinement in probabilistic systems

ACM SIGMETRICS Performance Evaluation Review
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Finding application errors and security flaws using PQL: a program query language

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic Taint Propagation for Java

ACSAC '05 Proceedings of the 21st Annual Computer Security Applications Conference
The essence of command injection attacks in web applications

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities (Short Paper)

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Perracotta: mining temporal API rules from imperfect traces

Proceedings of the 28th international conference on Software engineering
Raksha: a flexible information flow architecture for software security

Proceedings of the 34th annual international symposium on Computer architecture
Static specification inference using predicate mining

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Finding security vulnerabilities in java applications with static analysis

SSYM'05 Proceedings of the 14th conference on USENIX Security Symposium - Volume 14
Static detection of security vulnerabilities in scripting languages

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Improving software security with precise static and runtime analysis

Improving software security with precise static and runtime analysis
Information flow control for standard OS abstractions

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
From uncertainty to belief: inferring the specification within

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Making information flow explicit in HiStar

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Labels and event processes in the Asbestos operating system

ACM Transactions on Computer Systems (TOCS)
Mining temporal specifications for error detection

TACAS'05 Proceedings of the 11th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Defending against injection attacks through context-sensitive string evaluation

RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory
Language-based information-flow security

IEEE Journal on Selected Areas in Communications

Verification, Testing and Statistics

TAP '09 Proceedings of the 3rd International Conference on Tests and Proofs
Verification, Testing and Statistics

ICTAC '09 Proceedings of the 6th International Colloquium on Theoretical Aspects of Computing
Verification, Testing and Statistics

FM '09 Proceedings of the 2nd World Congress on Formal Methods
Permissive dynamic information flow analysis

PLAS '10 Proceedings of the 5th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security
Analyzing explicit information flow

ICISS'10 Proceedings of the 6th international conference on Information systems security
Probabilistic, modular and scalable inference of typestate specifications

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
kb-anonymity: a model for anonymized behaviour-preserving test and debugging data

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Path- and index-sensitive string analysis based on monadic second-order logic

Proceedings of the 2011 International Symposium on Software Testing and Analysis
Fast and precise sanitizer analysis with BEK

SEC'11 Proceedings of the 20th USENIX conference on Security
Static detection of access control vulnerabilities in web applications

SEC'11 Proceedings of the 20th USENIX conference on Security
Program analysis and machine learning: a win-win deal

SAS'11 Proceedings of the 18th international conference on Static analysis
SCRIPTGARD: automatic context-sensitive sanitization for large-scale legacy web applications

Proceedings of the 18th ACM conference on Computer and communications security
F4F: taint analysis of framework-based web applications

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
RoleCast: finding missing security checks when you do not know what checks are

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
SAFERPHP: finding semantic vulnerabilities in PHP applications

Proceedings of the ACM SIGPLAN 6th Workshop on Programming Languages and Analysis for Security
Program analysis and machine learning: a win-win deal

APLAS'11 Proceedings of the 9th Asian conference on Programming Languages and Systems
Measuring enforcement windows with symbolic trace interpretation: what well-behaved programs say

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Automated detection of client-state manipulation vulnerabilities

Proceedings of the 34th International Conference on Software Engineering
On the naturalness of software

Proceedings of the 34th International Conference on Software Engineering
Towards fully automatic placement of security sanitizers and declassifiers

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Inferring likely mappings between APIs

Proceedings of the 2013 International Conference on Software Engineering
Chucky: exposing missing checks in source code for vulnerability discovery

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Path- and index-sensitive string analysis based on monadic second-order logic

ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Toward general diagnosis of static errors

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

The last several years have seen a proliferation of static and runtime analysis tools for finding security violations that are caused by explicit information flow in programs. Much of this interest has been caused by the increase in the number of vulnerabilities such as cross-site scripting and SQL injection. In fact, these explicit information flow vulnerabilities commonly found in Web applications now outnumber vulnerabilities such as buffer overruns common in type-unsafe languages such as C and C++. Tools checking for these vulnerabilities require a specification to operate. In most cases the task of providing such a specification is delegated to the user. Moreover, the efficacy of these tools is only as good as the specification. Unfortunately, writing a comprehensive specification presents a major challenge: parts of the specification are easy to miss, leading to missed vulnerabilities; similarly, incorrect specifications may lead to false positives. This paper proposes Merlin, a new approach for automatically inferring explicit information flow specifications from program code. Such specifications greatly reduce manual labor, and enhance the quality of results, while using tools that check for security violations caused by explicit information flow. Beginning with a data propagation graph, which represents interprocedural flow of information in the program, Merlin aims to automatically infer an information flow specification. Merlin models information flow paths in the propagation graph using probabilistic constraints. A naive modeling requires an exponential number of constraints, one per path in the propagation graph. For scalability, we approximate these path constraints using constraints on chosen triples of nodes, resulting in a cubic number of constraints. We characterize this approximation as a probabilistic abstraction, using the theory of probabilistic refinement developed by McIver and Morgan. We solve the resulting system of probabilistic constraints using factor graphs, which are a well-known structure for performing probabilistic inference. We experimentally validate the Merlin approach by applying it to 10 large business-critical Web applications that have been analyzed with CAT.NET, a state-of-the-art static analysis tool for .NET. We find a total of 167 new confirmed specifications, which result in a total of 322 additional vulnerabilities across the 10 benchmarks. More accurate specifications also reduce the false positive rate: in our experiments, Merlin-inferred specifications result in 13 false positives being removed; this constitutes a 15% reduction in the CAT.NET false positive rate on these 10 programs. The final false positive rate for CAT.NET after applying Merlin in our experiments drops to under 1%.