Aiding prediction algorithms in detecting high-dimensional malicious applications using a randomized projection technique

  • Authors:
  • Travis Atkison

  • Affiliations:
  • Louisiana Tech University, Ruston, LA

  • Venue:
  • Proceedings of the 48th Annual Southeast Regional Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This research paper describes an on-going effort to design, develop and improve upon malicious application detection algorithms. This work looks specifically at improving a cosine similarity, information retrieval technique to enhance detection of known and variances of known malicious applications by applying the feature extraction technique known as randomized projection. Document similarity techniques, such as cosine similarity, have been used with great success in several document retrieval applications. By following a standard information retrieval methodology, software, in machine readable format, can be regarded as documents in the corpus. These "documents" may or may not have a known malicious functionality. The query is software, again in machine readable format, which contains a certain type of malicious software. This methodology provides an ability to search the corpus with a query and retrieve/identify potentially malicious software as well as other instances of the same type of vulnerability. Retrieval is based on the similarity of the query to a given document in the corpus. There have been several efforts to overcome what is known as 'the curse of dimensionality' that can occur with the use of this type of information retrieval technique including mutual information and randomized projections. Randomized projections are used to create a low-order embedding of the high dimensional data. Results from experimentation have shown promise over previously published efforts.