Tracking concept drift in malware families

Authors:
Anshuman Singh;Andrew Walenstein;Arun Lakhotia
Affiliations:
University of Louisiana at Lafayette, Lafayette, LA, USA;University of Louisiana at Lafayette, Lafayette, LA, USA;University of Louisiana at Lafayette, Lafayette, LA, USA
Venue:
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Year:
2012

Citing 25
Cited 1

Learning time-varying concepts

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Tracking Drifting Concepts By Minimizing Disagreements

Machine Learning - Special issue on computational learning theory
Learning in the presence of concept drift and hidden contexts

Machine Learning
Computer virus-antivirus coevolution

Communications of the ACM
The impact of changing populations on classifier performance

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Laws of Software Evolution Revisited

EWSPT '96 Proceedings of the 5th European Workshop on Software Process Technology
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
A Study of the Packer Problem and Its Solutions

RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Unknown Malcode Detection Using OPCODE Representation

EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection

Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
ANTIDOTE: understanding and defending against poisoning of anomaly detectors

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Adaptive concept drift detection

Statistical Analysis and Data Mining - Best of SDM'09
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Extracting compiler provenance from program binaries

Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
On challenges in evaluating malware clustering

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Cloud-based malware detection for evolving data streams

ACM Transactions on Management Information Systems (TMIS)
Measuring similarity of large software systems based on source code correspondence

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement

Approaches to adversarial drift

Proceedings of the 2013 ACM workshop on Artificial intelligence and security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The previous efforts in the use of machine learning for malware detection have assumed that malware population is stationary i.e. probability distribution of the observed characteristics (features) of malware populations don't change over time. In this paper, we investigate this assumption for malware families as populations. Malware, by design, constantly evolves so as to defeat detection. Evolution in malware may lead to a nonstationary malware population. The problem of nonstationary populations has been called concept drift in machine learning. Tracking concept drift is critical to the successful application of ML based methods for malware detection. If the evolution causes the malware population to drift rapidly then frequent retraining of classifiers may be required to prevent degradation in performance. On the other hand, if the drift is found to be negligible, then ML based methods are robust for such populations for long periods of time. We propose two measures for tracking concept drift in malware families when feature sets are very large-relative temporal similarity and metafeatures. We illustrate the use of the proposed measures with a study on 3500+ samples from three families of x86 malware, spanning over 5 years. The results of the study show negligible drift in mnemonic 2-grams extracted from unpacked versions of the samples. The measures can likewise be applied to track drift in any number of malware families. Tracking drift in this manner also provides a novel method for feature type selection, i.e., use the feature type that drifts the least.