A vector space model for automatic indexing
Communications of the ACM
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Code-Red: a case study on the spread and victims of an internet worm
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
ITS4: A static vulnerability scanner for C and C++ code
ACSAC '00 Proceedings of the 16th Annual Computer Security Applications Conference
Anomaly detection of web-based attacks
Proceedings of the 10th ACM conference on Computer and communications security
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques
IEEE Transactions on Software Engineering
DynaMine: finding common error patterns by mining software revision histories
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
MisleadingWorm Signature Generators Using Deliberate Noise Injection
SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities (Short Paper)
SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
How to design a good API and why it matters
Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Finding security vulnerabilities in java applications with static analysis
SSYM'05 Proceedings of the 14th conference on USENIX Security Symposium - Volume 14
Challenging the anomaly detection paradigm: a provocative discussion
NSPW '06 Proceedings of the 2006 workshop on New security paradigms
The ghost in the browser analysis of web-based malware
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Fuzzing: Brute Force Vulnerability Discovery
Fuzzing: Brute Force Vulnerability Discovery
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
Casting out Demons: Sanitizing Training Data for Anomaly Sensors
SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
Proceedings of the 2008 workshop on New security paradigms
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection
SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
On the infeasibility of modeling polymorphic shellcode
Machine Learning
A sense of self for Unix processes
SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Paragraph: thwarting signature learning by training maliciously
RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
Generalized vulnerability extrapolation using abstract syntax trees
Proceedings of the 28th Annual Computer Security Applications Conference
Chucky: exposing missing checks in source code for vulnerability discovery
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Hi-index | 0.00 |
Rigorous identification of vulnerabilities in program code is a key to implementing and operating secure systems. Unfortunately, only some types of vulnerabilities can be detected automatically. While techniques from software testing can accelerate the search for security flaws, in the general case discovery of vulnerabilities is a tedious process that requires significant expertise and time. In this paper, we propose a method for assisted discovery of vulnerabilities in source code. Our method proceeds by embedding code in a vector space and automatically determining API usage patterns using machine learning. Starting from a known vulnerability, these patterns can be exploited to guide the auditing of code and to identify potentially vulnerable code with similar characteristics--a process we refer to as vulnerability extrapolation. We empirically demonstrate the capabilities of our method in different experiments. In a case study with the library FFmpeg, we are able to narrowthe search for interesting code from 6,778 to 20 functions and discover two security flaws, one being a known flaw and the other constituting a zero-day vulnerability.