Beyond heuristics: learning to classify vulnerabilities and predict exploits

Authors:
Mehran Bozorgi;Lawrence K. Saul;Stefan Savage;Geoffrey M. Voelker
Affiliations:
UCSD / Google, San Diego, USA;UCSD, San Diego, USA;UCSD, San Diego, USA;UCSD, San Diego, USA
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 7
Cited 5

Windows of Vulnerability: A Case Study Analysis

Computer
Code-Red: a case study on the spread and victims of an internet worm

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
On the Brittleness of Software and the Infeasibility of Security Metrics

IEEE Security and Privacy
Does information security attack frequency increase with vulnerability disclosure? An empirical analysis

Information Systems Frontiers
Security holes... who cares?

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research

A large scale exploratory analysis of software vulnerability life cycles

Proceedings of the 34th International Conference on Software Engineering
A preliminary analysis of vulnerability scores for attacks in wild: the ekits and sym datasets

Proceedings of the 2012 ACM Workshop on Building analysis datasets and gathering experience returns for security
Point-and-shoot security design: can we build better tools for developers?

Proceedings of the 2012 workshop on New security paradigms
Automate back office activity monitoring to drive operational excellence

ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Adaptive non-critical alarm reduction using hash-based contextual signatures in intrusion detection

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed--well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.