Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning

Authors:
Fabian Yamaguchi;Felix Lindner;Konrad Rieck
Affiliations:
Recurity Labs GmbH, Germany;Recurity Labs GmbH, Germany;Technische Universität Berlin, Germany
Venue:
WOOT'11 Proceedings of the 5th USENIX conference on Offensive technologies
Year:
2011

Citing 27
Cited 2

A vector space model for automatic indexing

Communications of the ACM
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Code-Red: a case study on the spread and victims of an internet worm

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
ITS4: A static vulnerability scanner for C and C++ code

ACSAC '00 Proceedings of the 16th Annual Computer Security Applications Conference
Anomaly detection of web-based attacks

Proceedings of the 10th ACM conference on Computer and communications security
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques

IEEE Transactions on Software Engineering
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
MisleadingWorm Signature Generators Using Deliberate Noise Injection

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities (Short Paper)

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
How to design a good API and why it matters

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Finding security vulnerabilities in java applications with static analysis

SSYM'05 Proceedings of the 14th conference on USENIX Security Symposium - Volume 14
Challenging the anomaly detection paradigm: a provocative discussion

NSPW '06 Proceedings of the 2006 workshop on New security paradigms
The ghost in the browser analysis of web-based malware

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Fuzzing: Brute Force Vulnerability Discovery

Fuzzing: Brute Force Vulnerability Discovery
Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
Casting out Demons: Sanitizing Training Data for Anomaly Sensors

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
The developer is the enemy

Proceedings of the 2008 workshop on New security paradigms
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask)

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
On the infeasibility of modeling polymorphic shellcode

Machine Learning
A sense of self for Unix processes

SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Paragraph: thwarting signature learning by training maliciously

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection

Generalized vulnerability extrapolation using abstract syntax trees

Proceedings of the 28th Annual Computer Security Applications Conference
Chucky: exposing missing checks in source code for vulnerability discovery

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rigorous identification of vulnerabilities in program code is a key to implementing and operating secure systems. Unfortunately, only some types of vulnerabilities can be detected automatically. While techniques from software testing can accelerate the search for security flaws, in the general case discovery of vulnerabilities is a tedious process that requires significant expertise and time. In this paper, we propose a method for assisted discovery of vulnerabilities in source code. Our method proceeds by embedding code in a vector space and automatically determining API usage patterns using machine learning. Starting from a known vulnerability, these patterns can be exploited to guide the auditing of code and to identify potentially vulnerable code with similar characteristics--a process we refer to as vulnerability extrapolation. We empirically demonstrate the capabilities of our method in different experiments. In a case study with the library FFmpeg, we are able to narrowthe search for interesting code from 6,778 to 20 functions and discover two security flaws, one being a known flaw and the other constituting a zero-day vulnerability.