Detecting plagiarism in student Pascal programs
The Computer Journal
Instance-Based Learning Algorithms
Machine Learning
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Discrimination of authorship using visualization
Information Processing and Management: an International Journal
Software forensics: can we track code to its authors?
Computers and Security
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
On Relevance, Probabilistic Indexing and Information Retrieval
Journal of the ACM (JACM)
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Explicitly representing expected cost: an alternative to ROC representation
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments
Machine Learning
Principles of data mining
Machine Learning
Maximum Security
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Learning to detect malicious executables in the wild
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Detecting malicious java code using virtual machine auditing
SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Static analysis of executables to detect malicious patterns
SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Biologically inspired defenses against computer viruses
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Network intrusion detection: Evaluating cluster, discriminant, and logit analysis
Information Sciences: an International Journal
Machine Learning for Computer Security
The Journal of Machine Learning Research
Detection of unknown computer worms based on behavioral classification of the host
Computational Statistics & Data Analysis
Classification of packed executables for accurate computer virus detection
Pattern Recognition Letters
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Malware detection using adaptive data compression
Proceedings of the 1st ACM workshop on Workshop on AISec
Unknown Malcode Detection Using OPCODE Representation
EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
Improving malware detection by applying multi-inducer ensemble
Computational Statistics & Data Analysis
A Chronological Evaluation of Unknown Malcode Detection
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Information Security Tech. Report
Malicious Code Detection Using Active Learning
Privacy, Security, and Trust in KDD
Proceedings of the 47th Annual Southeast Regional Conference
Large-scale malware indexing using function-call graphs
Proceedings of the 16th ACM conference on Computer and communications security
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Extracting compiler provenance from program binaries
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Fast malware classification by automated behavioral graph matching
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
Proceedings of the 48th Annual Southeast Regional Conference
Pattern recognition techniques for the classification of malware packers
ACISP'10 Proceedings of the 15th Australasian conference on Information security and privacy
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Improved call graph comparison using simulated annealing
Proceedings of the 2011 ACM Symposium on Applied Computing
A new N-gram feature extraction-selection method for malicious code
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Recovering the toolchain provenance of binary code
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Using randomized projection techniques to aid in detecting high-dimensional malicious applications
Proceedings of the 49th Annual Southeast Regional Conference
A supervised topic transition model for detecting malicious system call sequences
Proceedings of the 2011 workshop on Knowledge discovery, modeling and simulation
Who wrote this code? identifying the authors of program binaries
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
Run-time malware detection based on positive selection
Journal in Computer Virology
FORECAST: skimming off the malware cream
Proceedings of the 27th Annual Computer Security Applications Conference
Proceedings of the 50th Annual Southeast Regional Conference
Feature reduction to speed up malware classification
NordSec'11 Proceedings of the 16th Nordic conference on Information Security Technology for Applications
Mal-ID: automatic malware detection using common segment analysis and meta-features
The Journal of Machine Learning Research
A classifier based on minimum circum circle
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Improving malware classification: bridging the static/dynamic gap
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Tracking concept drift in malware families
Proceedings of the 5th ACM workshop on Security and artificial intelligence
A fine-grained classification approach for the packed malicious code
ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Discriminant malware distance learning on structuralinformation for automated malware classification
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Using file relationships in malware classification
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Juxtapp: a scalable system for detecting code reuse among android applications
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Zero-day malware detection based on supervised learning algorithms of API call signatures
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Applying static analysis to high-dimensional malicious application detection
Proceedings of the 51st ACM Southeast Conference
Detecting malicious behaviour using supervised learning algorithms of the function calls
International Journal of Electronic Security and Digital Forensics
VILO: a rapid learning nearest-neighbor classifier for malware triage
Journal in Computer Virology
Detecting machine-morphed malware variants via engine attribution
Journal in Computer Virology
Malware detection by pruning of parallel ensembles using harmony search
Pattern Recognition Letters
A close look on n-grams in intrusion detection: anomaly detection vs. classification
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles
Proceedings of the 29th Annual Computer Security Applications Conference
SigMal: a static signal processing based malware triage
Proceedings of the 29th Annual Computer Security Applications Conference
Exploring discriminatory features for automated malware classification
DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
ExecScent: mining for new C&C domains in live networks with adaptive control protocol templates
SEC'13 Proceedings of the 22nd USENIX conference on Security
MutantX-S: scalable malware clustering based on static features
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Design and Implementation of a Data Mining System for Malware Detection
Journal of Integrated Design & Process Science
Detection of cross site scripting attack in wireless networks using n-Gram and SVM
Mobile Information Systems - Advances in Network-Based Information Systems
Hi-index | 0.00 |
We describe the use of machine learning and data mining to detect and classify malicious executables as they appear in the wild. We gathered 1,971 benign and 1,651 malicious executables and encoded each as a training example using n-grams of byte codes as features. Such processing resulted in more than 255 million distinct n-grams. After selecting the most relevant n-grams for prediction, we evaluated a variety of inductive methods, including naive Bayes, decision trees, support vector machines, and boosting. Ultimately, boosted decision trees outperformed other methods with an area under the ROC curve of 0.996. Results suggest that our methodology will scale to larger collections of executables. We also evaluated how well the methods classified executables based on the function of their payload, such as opening a backdoor and mass-mailing. Areas under the ROC curve for detecting payload function were in the neighborhood of 0.9, which were smaller than those for the detection task. However, we attribute this drop in performance to fewer training examples and to the challenge of obtaining properly labeled examples, rather than to a failing of the methodology or to some inherent difficulty of the classification task. Finally, we applied detectors to 291 malicious executables discovered after we gathered our original collection, and boosted decision trees achieved a true-positive rate of 0.98 for a desired false-positive rate of 0.05. This result is particularly important, for it suggests that our methodology could be used as the basis for an operational system for detecting previously undiscovered malicious executables.