Applying random projection to the classification of malicious applications using data mining algorithms

Authors:
Jan Durand;Travis Atkison
Affiliations:
Louisiana Tech University, Ruston, LA;Louisiana Tech University, Ruston, LA
Venue:
Proceedings of the 50th Annual Southeast Regional Conference
Year:
2012

Citing 22
Cited 0

Rogue programs: viruses, worms and Trojan horses

Rogue programs: viruses, worms and Trojan horses
C4.5: programs for machine learning

C4.5: programs for machine learning
Latent semantic indexing: a probabilistic analysis

Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Characterizing the behavior of a program using multiple-length N-grams

Proceedings of the 2000 workshop on New security paradigms
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval

Modern Information Retrieval
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Software forensics: old methods for a new science

SEEP '96 Proceedings of the 1996 International Conference on Software Engineering: Education and Practice (SE:EP '96)
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Detection and identification of network anomalies using sketch subspaces

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue applications

Proceedings of the 47th Annual Southeast Regional Conference
Biologically inspired defenses against computer viruses

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Aiding prediction algorithms in detecting high-dimensional malicious applications using a randomized projection technique

Proceedings of the 48th Annual Southeast Regional Conference
Using randomized projection techniques to aid in detecting high-dimensional malicious applications

Proceedings of the 49th Annual Southeast Regional Conference
Shared information and program plagiarism detection

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research is part of a continuing effort to show the viability of using random projection as a feature extraction and reduction technique in the classification of malware to produce more accurate classifiers. In this paper, we use a vector space model with n-gram analysis to produce weighted feature vectors from binary executables, which we then reduce to a smaller feature set using the random projection method proposed by Achlioptas, and the feature selection method of mutual information to produce two separate data sets. We then apply several popular machine learning algorithms including J48 decision tree, naïve Bayes, support vector machines, and an instance-based learner to the data sets to produce classifiers for the detection of malicious executables. We evaluate the performance of the different classifiers and discover that using a data set reduced by random projection can improve the performance of support vector machine and instance-based learner classifiers.