Evidential reasoning using stochastic simulation of causal models
Artificial Intelligence
Instance-Based Learning Algorithms
Machine Learning
Original Contribution: Stacked generalization
Neural Networks
C4.5: programs for machine learning
C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
A vector space model for automatic indexing
Communications of the ACM
Data Mining and Knowledge Discovery
Rule Induction with CN2: Some Recent Improvements
EWSL '91 Proceedings of the European Working Session on Machine Learning
Classification by Voting Feature Intervals
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Brief Introduction to Boosting
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Recent worms: a survey and trends
Proceedings of the 2003 ACM workshop on Rapid malcode
ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to detect malicious executables in the wild
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Malware prevalence in the KaZaA file-sharing network
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
A Feature Selection and Evaluation Scheme for Computer Virus Detection
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
The class imbalance problem: A systematic study
Intelligent Data Analysis
Opcodes as predictor for malware
International Journal of Electronic Security and Digital Forensics
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Malicious codes detection based on ensemble learning
ATC'07 Proceedings of the 4th international conference on Autonomic and Trusted Computing
Reducing dimensionality in a database of sleep EEG arousals
Expert Systems with Applications: An International Journal
Crowdroid: behavior-based malware detection system for Android
Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices
"Andromaly": a behavioral malware detection framework for android devices
Journal of Intelligent Information Systems
A graph mining approach for detecting unknown malwares
Journal of Visual Languages and Computing
A comparative study of malware family classification
ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Information Sciences: an International Journal
Editorial: Guest editorial: Special issue on data mining for information security
Information Sciences: an International Journal
Analyzing and defending against web-based malware
ACM Computing Surveys (CSUR)
POSTER: Detecting malware through temporal function-based features
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
A survey of multiple classifier systems as hybrid systems
Information Fusion
Hi-index | 0.00 |
This research synthesizes a taxonomy for classifying detection methods of new malicious code by Machine Learning (ML) methods based on static features extracted from executables. The taxonomy is then operationalized to classify research on this topic and pinpoint critical open research issues in light of emerging threats. The article addresses various facets of the detection challenge, including: file representation and feature selection methods, classification algorithms, weighting ensembles, as well as the imbalance problem, active learning, and chronological evaluation. From the survey we conclude that a framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious). The framework should include training of multiple classifiers on various types of features (mainly OpCode and byte n-grams and Portable Executable Features), applying weighting algorithm on the classification results of the individual classifiers, as well as an active learning mechanism to maintain high detection accuracy. The training of classifiers should also consider the imbalance problem by generating classifiers that will perform accurately in a real-life situation where the percentage of malicious files among all files is estimated to be approximately 10%.