Learning time-varying concepts
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Tracking Drifting Concepts By Minimizing Disagreements
Machine Learning - Special issue on computational learning theory
Learning in the presence of concept drift and hidden contexts
Machine Learning
Computer virus-antivirus coevolution
Communications of the ACM
The impact of changing populations on classifier performance
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Laws of Software Evolution Revisited
EWSPT '96 Proceedings of the 5th European Workshop on Software Process Technology
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
N-Gram-Based Detection of New Malicious Code
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Applying lazy learning algorithms to tackle concept drift in spam filtering
Expert Systems with Applications: An International Journal
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
A Study of the Packer Problem and Its Solutions
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Unknown Malcode Detection Using OPCODE Representation
EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
ANTIDOTE: understanding and defending against poisoning of anomaly detectors
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Adaptive concept drift detection
Statistical Analysis and Data Mining - Best of SDM'09
A case-based technique for tracking concept drift in spam filtering
Knowledge-Based Systems
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Extracting compiler provenance from program binaries
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
On challenges in evaluating malware clustering
RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Cloud-based malware detection for evolving data streams
ACM Transactions on Management Information Systems (TMIS)
Measuring similarity of large software systems based on source code correspondence
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
Approaches to adversarial drift
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Hi-index | 0.00 |
The previous efforts in the use of machine learning for malware detection have assumed that malware population is stationary i.e. probability distribution of the observed characteristics (features) of malware populations don't change over time. In this paper, we investigate this assumption for malware families as populations. Malware, by design, constantly evolves so as to defeat detection. Evolution in malware may lead to a nonstationary malware population. The problem of nonstationary populations has been called concept drift in machine learning. Tracking concept drift is critical to the successful application of ML based methods for malware detection. If the evolution causes the malware population to drift rapidly then frequent retraining of classifiers may be required to prevent degradation in performance. On the other hand, if the drift is found to be negligible, then ML based methods are robust for such populations for long periods of time. We propose two measures for tracking concept drift in malware families when feature sets are very large-relative temporal similarity and metafeatures. We illustrate the use of the proposed measures with a study on 3500+ samples from three families of x86 malware, spanning over 5 years. The results of the study show negligible drift in mnemonic 2-grams extracted from unpacked versions of the samples. The measures can likewise be applied to track drift in any number of malware families. Tracking drift in this manner also provides a novel method for feature type selection, i.e., use the feature type that drifts the least.