Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
A vector space model for automatic indexing
Communications of the ACM
Characterizing the behavior of a program using multiple-length N-grams
Proceedings of the 2000 workshop on New security paradigms
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
N-Gram-Based Detection of New Malicious Code
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Learning similarity measures in non-orthogonal space
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Feature Selection and Evaluation Scheme for Computer Virus Detection
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Proceedings of the 47th Annual Southeast Regional Conference
Biologically inspired defenses against computer viruses
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Proceedings of the 48th Annual Southeast Regional Conference
Proceedings of the 50th Annual Southeast Regional Conference
A high-dimensional two-sample test for the mean using random subspaces
Computational Statistics & Data Analysis
Hi-index | 0.00 |
This work is part of an on-going effort in using randomized projection as a feature extraction and reduction method to improve a cosine similarity, information retrieval technique to enhance the detection of known malicious applications and their variations. We follow a standard information retrieval methodology that allows software to be regarded as documents in the corpus. This provides the ability to search the corpus with a query, malicious software, and retrieve/identify potentially malicious software and other instances of the same type of vulnerability. In our experiments, we compare Gaussian-distributed random matrix randomized projection to two alternative methods of randomized projection, sparse matrix randomized projection and Linial-London-Rabinovich random set randomized projection, and assess their performance when applied to features of malicious applications extracted via the information retrieval technique of n-gram analysis. In our results, the Gaussian distributed random matrix approach outperformed the other methods with generally higher values for each observed performance metric, however, each algorithm showed promise in selected scenarios. These results support the hypothesis that applying the technique of random matrix projection as a dimensionality reduction method for the cosine similarity metric has merit in determining if an application may contain a malicious application.