Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
A vector space model for automatic indexing
Communications of the ACM
Characterizing the behavior of a program using multiple-length N-grams
Proceedings of the 2000 workshop on New security paradigms
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
N-Gram-Based Detection of New Malicious Code
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Learning similarity measures in non-orthogonal space
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Feature Selection and Evaluation Scheme for Computer Virus Detection
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Proceedings of the 47th Annual Southeast Regional Conference
Biologically inspired defenses against computer viruses
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Using randomized projection techniques to aid in detecting high-dimensional malicious applications
Proceedings of the 49th Annual Southeast Regional Conference
Proceedings of the 50th Annual Southeast Regional Conference
Hi-index | 0.00 |
This research paper describes an on-going effort to design, develop and improve upon malicious application detection algorithms. This work looks specifically at improving a cosine similarity, information retrieval technique to enhance detection of known and variances of known malicious applications by applying the feature extraction technique known as randomized projection. Document similarity techniques, such as cosine similarity, have been used with great success in several document retrieval applications. By following a standard information retrieval methodology, software, in machine readable format, can be regarded as documents in the corpus. These "documents" may or may not have a known malicious functionality. The query is software, again in machine readable format, which contains a certain type of malicious software. This methodology provides an ability to search the corpus with a query and retrieve/identify potentially malicious software as well as other instances of the same type of vulnerability. Retrieval is based on the similarity of the query to a given document in the corpus. There have been several efforts to overcome what is known as 'the curse of dimensionality' that can occur with the use of this type of information retrieval technique including mutual information and randomized projections. Randomized projections are used to create a low-order embedding of the high dimensional data. Results from experimentation have shown promise over previously published efforts.